Variable-Length 
Error-Correcting Codes 



A thesis submitted to the University of Manchester 
for the degree of Doctor of Philosophy in the Faculty of Science 

1995 

Victor Buttigieg 

Department of Electrical Engineering 



Contents 



Title Page 1 

Contents 2 

List of Figures 8 

List of Tables 12 

Abstract 14 

Declaration 15 

Copyright Notice 16 

About Author 17 

Acknowledgements 18 

List of Abbreviations 19 

Glossary of Symbols 20 

Dedication 23 

1 . Introduction 24 

1.1. Introduction 24 

1.2. Combined Source and Channel Coding 26 

1.3. Variable-Length Error-Correcting Codes 27 

1.4. Thesis Structure 29 



2 



2. Variable-Length Error-Correcting Codes 31 

2.1. Introduction 31 

2.2. Variable-Length Codes 31 

2.3. Some Properties of Variable-Length Codes 33 

2.3.1. Non-Singular Codes 33 

2.3.2. Unique Decidability 33 

2.3.3. Instantaneously Decodable Variable-Length Codes 35 

2.3.4. Exhaustive Codes 36 

2.3.5. Code Efficiency and Redundancy 36 

2.4. Synchronisation 36 

2.4.1. Synchronisation Schemes using a Marker 38 

2.4.2. Synchronisable Codes 39 

2.4.2.1. Comma-Free Codes 40 

2.4.3. Statistically Synchronisable Codes 42 

2.5. a-Correcting Codes .....45 

2.6. <z-Prompt Codes 48 

2.6.1. Prefix Decoding Algorithm 48 

2.6.2. Segment Decomposition 49 

2.6.3. Segment Decoding Algorithm 52 

2.7. Two-Length Error-Correcting Codes 53 

2.8. Instantaneous Decoding using the Massey Metric 53 

2.9. Symbol Error Probability 57 

2.9.1. Levenshtein Distance : 58 

2.9.2. A Practical Algorithm to Evaluate the Symbol Error Probability 59 



3 



2.10. Synchronisation-Error-Correcting Codes 60 

2.11. Conclusion 61 

3- Trellis Structure of Variable-Length Error-Correcting Codes 63 

3.1. Introduction 63 

3.2. Maximum Likelihood Decoding 64 

3.2.1. Tree Structure 64 

3.2.2. Trellis Structure 67 

3.2.2.1 . Trellis Construction Algorithm 68 

3.2.2.2. Modified Viterbi Algorithm 69 

3.3. Maximum A-Posteriori Metric 71 

3.4. Some Properties of VLEC codes 74 

3.4. 1 . Free Distance 74 

3.4.2. Constraint Length 76 

3.4.3. Catastrophic Codes 77 

3.5. Performance 79 

3.5.1. Union Bounds 79 

3.5.1 . 1 . Evaluating the Distance Spectrum 84 

3.5.2. Simulation 86 

3.5.3. Comparing Simulation Results and the Union Bound 87 

3.5.4. Comparing Maximum Likelihood and MAP Decoding 91 

3.5.5. Comparing Maximum Likelihood and Instantaneous Decoding 94 

3.6. Decoding Window Depth 97 

3.7. Complexity , : 99 

3.8. Conclusion 101 



4 



4. Sequential Decoding 103 

4.1. Introduction 103 

4.2. Metric for Sequential Decoding 104 

4.3. Stack Algorithm 105 

4.4. Performance 109 

4.4.1. Column Distance Function 109 

4.4.2. Sequentially Catastrophic VLEC Codes 113 

4.4.3. Simulation Results 115 

4.5. Complexity 121 

4.6. Conclusion 125 

5. Synchronisation Properties 128 

5.1. Introduction 128 

5.2. Average Error Span on the Binary Symmetric Channel 128 

5.3. Synchronisation Recovery without Start of Message 133 

5.4. Synchronisation Recovery on Channels with Symbol Deletions and 

Insertions 136 

5.4. L Symbol Deletions 136 

5.4.2. Symbol Insertions : 139 

5.5. Conclusion 140 

6- Code Constructions.: 143 

6.1. Introduction : 143 

6.2. Linear VLEC Codes 144 



5 



6.2.1. Vertically Linear VLEC Codes 145 

6.2.2. Horizontally Linear VLEC Codes 146 

6.3. Code-Anticode Construction 147 

6.4. Heuristic Construction Algorithm 152 

6.4.1. Choosing a Good Fixed-Length Coset Code from a Given Set of Words 157 

6.4.2. Choosing a Good Fixed-Length (Non-Linear) Code from a Given Set of 
Words 158 

6.4.3. Deleting a Codeword 159 

6.5. Comparing Constructions 160 

6.5.1. Two-Length Error-Correcting Codes and VLEC Codes 164 

6.6. Comparing Performance of VLEC Codes with Standard Coding 
Techniques 165 

6.7. Conclusion I , 172 

Tm Conclusion 175 

7.1. Scope for Further Research 178 

7.2. Some New Ideas , 179 

7.2.1. State-Splitting Variable-Length Error Correcting Codes 180 

7.2.2. Finite State Variable-Length Error-Correcting Codes 181 

References 185 

Appendix A 

VLEC Codes for the 26-Symbol English Source and the 128-Symbol ASCII 
Source 191 



6 



Appendix B 

Two Algorithms to Calculate the Distance Spectrum of VLEC Codes 197 

Appendix C 

Published Papers 213 

Index ...246 



7 



List of Figures 



Figure 1.1: Standard basic digital communication system 25 

Figure 1.2: Combined source and channel coding 27 

Figure 3.1 : First tree segment for a VLEC code C 65 

Figure 3.2: Tree diagram for code C 4 up to length 12 bits 66 

Figure 3.3: Trellis diagram for code C 4 69 

Figure 3.4: Alternative trellis diagram for code C 4 69 

Figure 3.5: Proof of maximum likelihood decoding 71 

Figure 3.6: Proof of Theorem 3.3 ., 76 

Figure 3.7: Catastrophic behaviour of C 5 78 

Figure 3.8: The three possible error events interactions 83 

Figure 3.9: Symbol error probability curves for C 3 88 

Figure 3.10: Symbol error probability curves for C 7 90 

Figure 3.11: Comparisons between MAP and maximum likelihood decoding for 

C s 94 

Figure 3.12: Performance comparisons for the ai-prompt code given in Table 

A.1 95 

Figure 3.13: Performance comparisons for the ai,i-prompt code given in Table 

A.1 96 

Figure 3.14: Effect of decoding window depth on the performance of VLEC 

code C 15 99 



8 



Figure 4.1: Evolution of stack contents in example 4.1 108 

Figure 4.2: Computing the CDF for a VLEC code 112 

Figure 4.3: Necessary condition for correct decoding with sequential decoding. 1 14 

Figure 4.4: Column distance function for the three codes C 9 , C10 and Cn 115 

Figure 4.5: Comparing performance of maximum likelihood and sequential 

decoding for codes C 9 , Ci 0 and Cn 116 

Figure 4.6: Column distance function for the three codes C15, Ci 6 and C17 118 

Figure 4.7: Comparison between maximum likelihood and sequential decoding 

for codes C 15 and C17 118 

Figure 4.8: Effect of stack size on performance of sequential decoding for 

code C i7 : 119 

Figure 4.9: Performance of C 17 with exact and approximate metrics 120 

Figure 4.10: Performance of C g with exact and approximate metrics 120 

Figure 4.1 1 : Number of extended paths for codes C 9l C10, and Cn 122 

Figure 4.12: Extra extended paths for codes C 9 , C10 and Cn 123 

Figure 4.13: Extra extended paths for codes C15, C| 6 , and C17 125 

Figure 5.1 : Variation of average effective error span with cross-over probability 

for C 17 132 

Figure 5.2: Incorrect synchronisation after n initial bits lost 134 

Figure 5.3: Effective error span for code d 7 with normally assigned initial path 

metrics , 135 

Figure 5.4: Performance for codes C15, C16, and Ci 7 for a number of 

consecutive bits deleted after bit position 200 1 37 

Figure 5.5: Synchronisation recovery under a single bit deletion 138 



9 



Figure 5.6: Probability distribution of the effective error span for codes Ci 5l 

C16, and C 17 139 

Figure 5.7: Performance for codes Ci 8 and C19 for a number of consecutive 

bits deleted after bit position 200 140 

Figure 5.8: Performance for codes C15, Ci 6 , and C17 for a number of 

consecutive random bits inserted after bit position 200 141 

Figure 6.1 : Horizontal and vertical sub-codes of a VLEC code 145 

Figure 6.2: Generator matrix for (13,5,5) fixed-length linear block code 149 

Figure 6.3: Rearranged generator matrix for the (13,5,5) fixed-length linear 

block code with (3,5,2) anticode in the rightmost position 150 

Figure 6.4: Codebook for (13,5,5) code and the derived VLEC code, C13 151 

Figure 6.5: VLEC code C 14 (8@10,5; 8@11,5; 16@12,5; 3,2) and its horizontal 

linear sub-codes 1 52 

Figure 6.6: Heuristic construction algorithm for VLEC codes 156 

Figure 6.7: Comparing the performance of a two-length error-correcting code 

with VLEC codes 164 

Figure 6.8: Free/Minimum distance 5 codes used to encode the 26-symbol 

English source 165 

Figure 6.9: Free/Minimum distance 7 codes used to encode the 26-symbol 

English source : 168 

Figure 6.10: Free/Minimum distance 5 codes used to encode the 128-symbol 

ASCII source 169 

Figure 6.11: Free/Minimum distance 7 codes used to encode the 128-symbol 

ASCII source 170 

Figure 7,1 : SSVLEC code of order two 181 



10 



Figure 7.2: Finite state diagram for VLEC code C6 with output branch labels of 

length L^ 182 

Figure 7.3: Finite state diagram for VLEC code Cq with output branch labels of 

length L a 183 



11 



List of Tables 



Table 2.1: Synchronous code for eight-symbol source 44 

Table 2.2: An a-correcting code C 2 47 

Table 2.3: An eight-codeword cn-prompt code C 3 56 

Table 2.4: Possible tails for code C 3 56 

Table 2.5: Probabilities for all codewords of C 3 given that we receive 

10011110 57 

Table 3.1: VLEC Code C 4 65 

Table 3.2: An example for a catastrophic VLEC code C 5 78 

Table 3.3: Simple VLEC code C 6 79 

Table 3.4: Distance spectrum for code C 3 up to state S 33 90 

Table 3.5: Code C 7 90 

Table 3.6: Distance spectrum for code C7 up to state S 3 o 91 

Table 3.7: Different source probabilities for code C 8 94 

Table 4.1 : Two-codeword codes with average codeword length of 6.5 bits for a 

uniform source 116 

Table 4.2: Required CDF growth to satisfy condition given by expression (4.12) 

for codes C 9 , C10 and C u 117 



12 



Table 4.3: Percentage computational load for codes C15, Ci 6 and C 17 with the 

sequential decoding algorithm as compared with the modified 

Viterbi algorithm 126 

Table 5.1: Huffman code given in Maxted and Robinson [1985] 131 

Table 6.1 : Comparing the number of codewords found using the greedy 

algorithm and the majority voting algorithm when W contains all 

possible n-tuples 159 

Table 6.2: Various codes for the 26-symbol English source constructed using 

different algorithms 162 

Table 6.3: Horizontally linear VLEC codes for the 26-symbol English source 

with dfree = 5 with various numbers of sub-codes 163 

Table 6.4: Variation of average codeword length with c/ m in for the heuristic 

construction 163 

Table 6.5: Comparing number of computations required for convolutional and 

VLEC codes 171 

Table A.1 : «i-prompt and ai,i-prompt codes for the 26-symbol English source. 191 
Table A.2: Various VLEC codes for the 26-symbol English source with dfr ee = 5 192 
Table A. 3: Two VLEC codes constructed using the heuristic construction with 

the majority voting algorithm for the 26-symbol English source 193 

Table A.4: Two VLEC codes for the 128-symbol ASCII source derived from a 

C-program 196 



13 



Abstract 



Variable-length error-correcting (VLEC) codes are considered for combined source 
and channel coding. Instantaneous decoding algorithms for VLEC codes treated previously 
in the literature are found to suffer from loss of synchronisation over the binary symmetric 
channel, consequently resulting in poor performance. A novel maximum likelihood 
decoding algorithm, based on a modified form of the Viterbi algorithm, is derived for these 
codes by considering the spatial memory due to their variable-length nature. This decoding 
algorithm achieves a large coding gain (from 1 to 3 dB) over the instantaneous algorithms 
because of its good synchronisation properties. 

The performance of these codes with maximum likelihood decoding when compared 
to standard cascaded source and channel coding schemes with similar parameters is found 
to be slightly better (about 0.5dB gain). However, the decoding complexity for VLEC 
codes is greater. This problem is solved by implementing a sequential decoding strategy, 
which, for almost the same performance, offers a much reduced computational effort 
(about an order of magnitude less) when the signal-to-noise ratio on the channel is 
relatively high. 

The synchronisation performance of VLEC codes with maximum likelihood 
decoding over channels which admit symbol deletion or insertion errors is also found to be 
good (synchronisation is recovered within less than two source symbols following an 
error). 

Various properties of VLEC codes influencing their performance both with maximum 
likelihood and sequential decoding are defined and characterised. A union bound on their 
performance over the binary symmetric channel is derived. Several different constructions 
for VLEC codes are given, one of which optimises the average codeword length for a given 
source while attaining the required error-correcting power. 

14 



Declaration 



No portion of the work referred to in the thesis has been submitted in support of an 
application for another degree or qualification of this or any other university or other 
institute of learning. 



15 



Copyright Notice 



(1) Copyright in text of this thesis rests with the Author. Copies (by any process) either 
in full, or of extracts, may be made only in accordance with instructions given by 
the Author and lodged in the John Rylands University Library of Manchester. 
Details may be obtained from the Librarian. This page must form part of any such 
copies made. Further copies (by any process) of copies made in accordance with 
such instructions may not be made without the permission (in writing) of the 
Author. 

(2) The ownership of any intellectual property rights which may be described in this 
thesis is vested in the University of Manchester, subject to any prior agreement to 
the contrary, and may not be made available for use by third parties without the 
written permission of the University, which will prescribe the terms and conditions 
of any such agreement. 



16 



About Author 



Victor Buttigieg received the B.Elec.Eng. (Hons) degree in Electrical Engineering 
from the University of Malta in 1990. For a brief period he taught at the Fellenberg 
Technical Institute for Industrial Electronics, Malta, before joining the Faculty of Electrical 
and Mechanical Engineering at the University of Malta as an Assistant Lecturer. In 1991 
he was awarded a three-year Commonwealth Academic Staff Scholarship by the 
Association of Commonwealth Universities at the University of Manchester. He was 
awarded the degree of M.Sc. in Digital Systems Engineering from the University of 
Manchester in 1992. 



17 



Acknowledgements 



First of all I wish to thank Professor RG. Farrell, my supervisor, for the excellent way 
in which he guided me throughout this project. Without his constant encouragement, sense 
of humour and many hours of stimulating discussions this work would not have reached its 
present form. 

To all fellow students and staff in the Communications Research Group at the 
University of Manchester goes my sincere thank you. The many interesting discussions, 
technical and non-technical, and friendship made life in Manchester that much easier. My 
special thanks goes to Jon Larrea, who for many months was my dependable link with 
Manchester. 

I also wish to thank the staff of the Department of Communications and Computer 
Engineering at the University of Malta, for giving me time to complete this thesis during 
the last few months back at my department. 

I also acknowledge the financial support of the Association of Commonwealth 
Universities and the British Council, through the award of a Commonwealth Academic 
Staff Scholarship. 

Last but not least I want to thank my wife, Rose Marie, and my son, Darren, for all 
the time, love and support they give me, without which this thesis would not have been 
possible. 



18 



List of Abbreviations 



Ascn 


American Standard Code for Information Interchange 


AWGN 


Additive White Gaussian Noise 


BCH 


Bose-Chaudhuri-Hocquenghem 


BMS 


Binary Memory-less Source 


BPSK 


Binarv Phase Shift Kevins 


BSC 


Binary Symmetric Channel 


BSD 


Bounded Synchronisation Delay 


CAS 


Compare Add Select 


CDF 


Column Distance Function 


FSVLEC 


Finite State Variable-Length Error-Correcting 


GA 


Greedy Algorithm 


GCD 


Greatest Common Divisor 


HDLC 


High-level Data Link Control 


iff 


If and only if 


LHS 


Left Hand Side 


MAP 


Maximum A-Posteriori 


MLD 


Maximum Likelihood Decoding 


MVA 


Majority Voting Algorithm 


NASA 


National Aeronautical and Space Agency 


RHS 


Right Hand Side 


RS 


Reed-Solomon 


SEP 


Symbol Error Probability 


SSVLEC 


State Splitting Variable-Length Error-Correcting 


VLEC 


Variable-Length Error-Correcting 


VLSI 


Very Large Scale Integration 



19 



Glossary of Symbols 



V 


ror ail 


a 


Admissibility (or Error) mapping 


X 


Index of comma freedom 


8 


Maximum distance for an anticode 




Forced transition without any input symbols 


0MAP 


MAP factor 


rj 


Code efficiency 


TJm 


Total number of source symbols in message m 


K 


Positive constant 


X 


Empty word 




The minimum number of consecutive states in the repetitive part of the 




trellis diagram for a VLEC code C required to build up all subsequent states 


e 


Positive constant 


cr 


Number of different codeword lengths 


V 


Average number of information bits required to encode a source symbol 


£ 


Average number of paths which visit the top of the stack per transmitted 




source symbol 


A 


Fixed-length anticode 


A 


Information source 


A h 


The average number of converging pairs of paths at Hamming distance h 


a 


Sequence of source symbols 


a 


Source symbol 




Codewords of (F 


B h 


Average Levenshtein distance between all converging pairs of paths whose 




encoded messages are at a Hamming distance h from each other 


b k 


Minimum block distance for codewords of length 




Overall minimum block distance 


c 


VLEC code 


C(a, b) 


Converging distance between a and b 


Cm 


Average number of source symbols in all converging pairs of paths whose 




encoded messages are at a Hamming distance h from each other 


c 


Code symbol - 


c< 


A codeword of code C 


Cmin 


Minimum converging distance 


D 


The number of states in the decoding window 


D(a, b) 


Diverging distance between a and b 


d 


Minimum distance for a block code 




CDF 



20 



tffree Free distance 

d m \ n Minimum diverging distance 

d u Unequal length free distance 

E Expanded code for VLEC code under error mapping a 

E Scff Average effective error span 

E s Average error span 

E x Average number of extra paths extended per source symbol more than the 

minimum number 

e Number of errors 

<F Fixed-length code 

F(u m , y) Fano metric for message u w given received sequence y 

Fn Extended code of order N for VLEC code C 

f, A codeword of the extended code Fm 

f N Cardinality of F N 

G q s The set of all pairs of path segment indices corresponding to path segments 

which diverge at state S q and merge again for the first time at state S r 

g The GCD of the codeword lengths of C 

H(A) Entropy of source A 

H(a,b) Hamming distance between a and b 

Hi Horizontal sub-code 

h Hamming distance between two given codewords 

K Constraint length 

k Information vector 

k Number of information bits 

L Codeword length for fixed-length code 

Z,(a, b) Levenshtein distance between a and b 

^average Average codeword length for a given code and source 

Li The ith different codeword length 

/, Length of codeword c, 

Mi The metric value for state 5/ 

m Block length for anticode 

rrij The metric for codeword c, 

Af Number of bits in the encoded message 

N m Number of bits in encoded message m 

Ns The total number of states in the trellis 

n Block length 

n Number of bits lost at the start of the message 

rii Number of codewords in a given path with length Z, 

p Cross-over probability for the BSC 

p Proper prefix of a word 

P(a) Probability of occurrence of a 

P(E) Error event probability 

P(E, r) The error event probability at bit position r. 

P 0 The probability measure induced on the channel output alphabet when the 

channel inputs are used according to some probability distribution Q( ) 

Pj(E) The first error event probability at any bit position 

Pj(E, r) The first error event probability at bit position r 

P h Probability of decoding a sequence into another sequence at distance h over 
the BSC 



21 



P m Probability of message m. 

p m Transmitted path through tree 

P N The set of paths through trellis (or tree) of length N bits 

P S (E) Symbol error probability 

Pi The zth path through tree 

p m N Minimum distance path to state S'n 

{pqs) A Th e first fit branches of the path segment p l q r 

pi The zth path through trellis going to state S r 

p* The segment of the path p[ from state S q to state S r 

Q( ) Probability distribution for channel input alphabet 

Qi Segment decomposition of a VLEC code 

q Codeword segment 

q Number of code symbols 

q, r Bit position 

R Code rate 

Si State in trellis diagram representing bit position i 

s Number of codewords/source symbols 

s Proper suffix of a word 

Si Number of codewords with length Z, 

?, Number of codewords with length less than Z, 

T Maximum decoding delay 

/ Number of correctable errors per segment or per codeword 

u m Codeword sequence corresponding to message m 

v State label for a FSVLEC code 

Vi Vertical sub-code 

W Allowed set of words satisfying given conditions 

W Decoding window depth in bits 

fF(a) Hamming weight for a 

w, x, y 5 x Words over X 

Yt d Received word 

X Code alphabet 

Xi Code symbol 

y Received bit sequence 

Z Maximum synchronisable delay 

z Synchronisable delay 



22 



To Rose Marie and little Darren 



23 



Chapter 1 . 

Introduction 



1.1. Introduction 

The aim of any communication system is to transmit information from some source 
at point A, to some sink at point B over some channel. The channel can take several forms, 
such as a physical cable, a wireless link or even a storage device. The communication 
system is successful if it transmits this information faithfully and efficiently. If the 
information source is digital in nature, such as computer data for instance, then we may 
require that the reproduction of the information at the sink will be an exact replica of the 
source data. In other instances, especially for analogue information sources, such as 
speech, we allow some distortion at the sink. This distortion may result from noise on the 
communication channel or even from the way the source is transmitted. For instance, 
transmitting a speech signal over a digital communication system will always introduce 
some distortion, even for a noiseless channel, since in order to digitise the signal a finite 
number of quantisation levels must be used. Currently, most communication systems are 
being implemented using digital technology due to a host of advantages, the most 
important of which are its ease of implementation using VLSI technology and its superior 
performance in noise. Digital transmission also allows a host of signal processing 
techniques which would otherwise be impossible or difficult to implement in analogue 
form. This thesis treats one such technique, to perform combined source and error- 
correction coding. 



24 



Chapter 1 - Introduction 



Figure 1.1 shows the basic block diagram for a digital communication system, where 
the source is already assumed to be in digital form [Viterbi & Omura, 1979]. The source 
encoder is used to remove as much as possible the redundancy present in most natural 
information sources, performing what is commonly known as data compression. The 
source encoding could either be a one-to-one mapping, in which case the source may be 
reproduced exactly if the source coded data is transmitted over an error free channel, or it 
could be a many-to-one mapping. In this latter case, although better compression may be 
achieved, the source can only be recovered within some fidelity criterion. In this thesis we 
are always going to assume that the source is discrete, memory-less and stationary, i.e. the 
probability of occurrence of any source symbol is independent of previously emitted 
symbols and independent of time. Also, we are only going to consider distortionless source 
coding. 



Source 



► Source Encoder 



Channel 
Encoder 



Sink 



Source Decoder 



Channel 
Decoder 



► Modulator 



Channel 



Demodulator 



Noise 



Discrete 
Channel 



Figure 1.1: Standard basic digital communication system 



In practice, noise is usually present, in one form or another, on any communication 
channel. In a digital system this translates into errors in the received sequence of symbols. 
There are several techniques one could adopt in order to reduce these errors, such as 
increasing the signal power, reducing the transmission speed, and so on. One of the 
techniques which has been gaining ground over the last several years is that of error- 



25 



Chapter 1 - Introduction 



correction, whereby the source coded data is further encoded using a so-called error- 
correction code [Lin & Costello, 1983]. The channel encoder may involve other levels of 
coding (such as run-length limited encoding for magnetic recording, for instance 
[Schouhamer Immink, 1990]). Here, however, we shall use error-correction and channel 
coding as synonymous. Error correction is achieved by introducing structured redundancy 
in the data. Through this redundancy, the decoder can determine that errors have occurred 
during a transmission (error-detection) or even more powerful, which of the transmitted 
symbols are in error. Even from this simplistic overview, the dual nature for source and 
channel coding is already apparent, whereby source coding is removing redundancy for 
efficient transmission and channel coding is re-introducing redundancy to combat errors on 
the channel. This duality is in fact much deeper, as established by Shannon's source and 
channel coding theorems [Shannon, 1948]. 

1 .2. Combined Source and Channel Coding 

Since source coding is removing redundancy and channel coding is reintroducing it, 
albeit in a different form, we may query if it is better to combine these two operations into 
a single operation, as shown in Figure 1 .2. However, a direct consequence of Shannon's 
work is precisely that these two operations may be separated without any loss in 
performance, for most common sources and channels. This has become known as the 
separation theorem. Note, however, that the separation theorem does not hold for certain 
classes of sources and/or channels [Vembu et aL y 1995]. 

Separating the two operations has the advantage that if the source is changed in a 
system, the only component that needs to be modified is the source encoder/decoder pair. 
Similarly, if the characteristics of the channel change, then it is only the channel 
encoder/decoder pair that needs to be replaced. Shannon's work gives little insight, 
however, into how complex the system may become by separating the two operations. His 
work does not preclude the possibility that by combining the two, the overall system 
complexity for a given performance may be reduced. Massey [1 978] has investigated this 
problem for the special case of combined linear source and channel coding applied to a 



26 



Chapter 1 - Introduction 



binary memoryless source (BMS) and a binary symmetric channel (BSC). He has found 
that for the distortionless case, a combined source and channel linear encoder is simpler to 
implement and is as optimal as separate encoders. Interestingly enough, this result does not 
hold when some distortion is allowed at the decoder, where here the combined scheme 
would be sub-optimal. Obviously, once the two encoders are combined, there is the 
disadvantage that the system becomes less flexible, in that any change in the source and/or 
channel statistics will entail a change in the combined encoder (and decoder). 



Source 



Combined 
Source-Channel 
Encoder 



Modulator 



Channel < Noise 




Combined 
Source-Channel 
Decoder 



Demodulator 



Discrete 
Channel 



Figure 1.2: Combined source and channel coding 



1.3. Variable-Length Error-Correcting Codes 

Error-correcting codes can be broadly classified as block or convolutional [Lin & 
Costello, 1983]. The main difference between the two lies in the fact that in block codes, k 
symbols of information are mapped into n code symbols, not necessarily from the same 
alphabet, where n > k. In convolutional codes, a similar mapping takes place, but in this 
case it also depends on previous inputs. Hence, convolutional codes have memory. The 



27 



Chapter I - Introduction 



amount of memory, or constraint length [Viterbi, 1971], will determine the performance. 
For this reason, in convolutional codes n and k are usually very small: k - 1 and n = 2 are 
normal for these codes, whereas the corresponding values in the case of block codes are of 
the order of a few hundreds, since here it is the block length which determines the 
performance 1 . For instance, an « = 255, A: =223, denoted by (255,223), Reed-Solomon 
(RS) code with 8-bit symbols is a standard block code used in deep space communication 
by NASA [Sweeney, 1991]. 

In this thesis we examine a new class of error-correcting codes which we shall call 
VLEC (Variable-Length Error-Correcting) codes. As the name implies, the main difference 
between these codes and the standard block and convolutional codes is the fact that the 
codewords are of variable length. The codes that we investigate here are similar to block 
codes in the respect that each codeword is mapped to a given set of information symbols 
irrespective of the previous inputs. However, their main characteristics are very similar to 
those of convolutional codes. This similarity is brought about by the fact that the position 
of any codeword within the encoded message depends on the previously occurring 
codewords and hence VLEC codes exhibit a form of "spatial memory". In the case of 
convolutional codes the memory in the encoder directly affects the value of the output, but 
not its position. 

In addition, due to their variable length nature, VLEC codes may be used to perform 
combined source and channel coding by assigning the shorter codewords to the more 
probable source symbols. In the encodings considered in this thesis, each source symbol is 
mapped to a single codeword according to the source statistics. However this could easily 
be extended to have multiple source symbols mapped to single codewords in order to 
increase the code efficiency by increasing the number of codewords in the VLEC code. 
The number of source symbols mapped to each codeword may not necessarily be fixed, 
resulting in variable-to-variabie length encoding. 



Since a large block length averages out the effects of noise and enables good codes to be constructed. 

28 



Chapter 1 - Introduction 



1.4. Thesis Structure 

The first published work on VLEC codes that we are aware of is that edited by 
Hartnett [1974], who compiled a series of reports originating at Parke Mathematical 
Laboratories, Massachusetts, U.S.A between 1957 and 1968. Since then not much has 
appeared apart from recent work by Dunscombe [1988], Bernard and Sharma [1990] and 
Escott [1995]. This work is treated in Chapter 2, which also includes a general exposition 
of variable-length codes. However, all the previous work on VLEC codes has completely 
ignored the spatial memory inherent in VLEC codes. Consequently, the performance of 
these codes with the published decoding algorithms is not very good, which may partly 
explain why they are not treated much in the literature. The spatial memory of VLEC 
codes was first considered by Buttigieg [1992]. This work is improved upon in Chapter 3, 
where a maximum likelihood decoding algorithm for these codes is derived. This 
algorithm offers substantial coding gain over the earlier decoding algorithms for VLEC 
codes, but is achieved at the price of increased complexity. This drawback is tackled in 
Chapter 4, where a sequential decoding algorithm based on the stack algorithm for 
convolutional codes is given. It is shown that for reasonably large signal to noise ratios, 
the decoding complexity is greatly reduced over the maximum likelihood decoding 
algorithm, while maintaining the same performance. 

One of the main problems with variable-length codes in general is that of loss of 
synchronisation. In our opinion this is another reason why VLEC codes were not 
considered much in the literature. The main objective of error-correcting codes is to reduce 
the effect of errors on the channel. Loss of synchronisation has the opposite effect whereby 
errors on the channel may be propagated by the decoder. This is especially evident with the 
decoding algorithms found in the literature. In Chapter 5 we show that VLEC codes with 
maximum likelihood decoding have reasonably good synchronisation properties over the 
BSC and will also perform well over channels which allow deletion or insertion of channel 
symbols. These kinds of errors are especially problematic in the case of standard error- 
correcting codes due to their fixed-length nature. 



29 



Chapter I - Introduction 



Having determined which properties of VLEC codes influence their performance in 
the earlier chapters, Chapter 6 discusses issues involved with their construction and gives 
two construction algorithms. Codes for the 26-symbol English source and the 128-symbol 
ASCII source are constructed and their performance compared with standard error- 
correcting codes with and without source coding. 

Finally, in Chapter 7 we draw some conclusions on the performance of VLEC codes 
for combined source and channel coding. We also give some new ideas to improve their 
performance and list some open problems. 



30 



Chapter 2. 

Variable-Length Error-Correcting Codes 



2.1. Introduction 

Variable-length codes are normally used for source coding. Consequently, they are 
frequently considered in conjunction with noiseless channels. Hence, we will first review 
some properties that characterise variable-length codes for the noiseless case. 

The problem of synchronisation is then considered, both in the general case and in 
particular for variable-length codes. This is the main problem area for variable-length 
codes, which limits their use in practice. We will then consider variable-length codes 
capable of correcting substitution errors. In particular, the special class of a-prompt codes 
is considered in detail, and three instantaneous decoding algorithms for these codes are 
given. 

Most of the work presented in this chapter has appeared previously in the literature, 
as will be indicated. However, there are a few extensions of previous work in Sections 2.6, 
2.8 and 2.9. In particular, in Section 2.9.2, a practical algorithm to determine the symbol 
error probability in the case of variable- length codes is given. This is suitable for use in 
computer simulations to determine the performance of variable-length codes. 

2.2. Variable-Length Codes 

Let X be a code alphabet with cardinality q. A finite sequence w = x\x 2 - • •*/ of code 
symbols is called a word over X of length |w| = /, where e X 9 for all i = 1, 2, /. 
Denote the set of all finite-length words over X by X* . Note that if X denotes the empty 



31 



Chapter 2 - Variable-Length Error-Correcting Codes 



word, X g X* . Let Jl* = /u/l. Given w, p, s e X*, if w = ps, then p is a proper prefix of 
w and s is a proper suffix of w. A set C of words is called a code. Note that C c X*. 
Similarly, denote the set of all finite-length sequences of codewords of C by C* and let C* = 

Let the code C have s codewords {ci, C2, c s } and let /,= |c,-|, i = 1, 2, j. 
Without loss of generality, assume that I\ < h ^ ••• ^ h- Further, let ex denote the number of 
different codeword lengths in the code C and let these lengths be L\ 9 L2, where 
L\<L2<"' <L& Let the number of codewords with length L k be 5,, and the number of 
codewords with length less than Li be l u i.e. J* = J^if / * Note that ?i = 0 and that Z,i = 

= /i, L 2 = h 2 +u icr = = /* and J^YH,^ - We shall use s 2 @Li, 
Sa@L^) to denote such a code. We shall later expand this notation for the case of variable- 
length error-correcting (VLEC) codes (c.f. Chapter 3). 

If <r= 1, then C is a fixed-length code. Hence, we shall define a variable-length code 
C to be a code with a > 1 . Further, if 9 = 2, then the code will be binary and X could be 
taken to be the set {0, 1}. Unless otherwise stated, it will be assumed throughout this 
thesis that C is a binary variable-length code. However, most of the results obtained may 
easily be extended to non-binary codes. 

The Hamming weight (or simply weight) of a word w, FF(w), is the number of non- 
zero symbols in w. The Hamming distance (or simply distance) between two equal length 
words, //(wi,W2), is the number of positions in which wi and W2 differ. For the binary case, 
it is easy to see that 7/(Wi,w 2 ) = W[wi+w 2 ), where the addition is modulo-2. 

Let^l be a memory-less data source with s source symbols {a u a 2 , •*•, a s }, each with 
probability of occurrence P(ai) y /=1, 2, s, with ^ Pjcii) = 1 * Without loss of 
generality, assume that P(a\) > P(a 2 ) > ••■ > P(a s ). The source A is encoded using code C 
by mapping symbol a, to codeword c, for all / = 1, 2, s. It is easy to prove that this 
mapping is the most efficient given code C and source A. In this case, the average 
codeword length is given by 



32 



Chapter 2 - Variable-Length Error-Correcting Codes 



2.3. Some Properties of Variable-Length Codes 

If we compare variable-length codes to fixed-length codes, we find that the former 
are much more difficult to deal with, since in this case there is also a degree of ambiguity in 
determining the codeword boundaries. In the case of fixed-length codes, once we know 
where a codeword starts, then from that point onwards it is very easy to determine the 
subsequent boundaries, assuming that the channel does not insert or delete code symbols. 
In the previous statement, there are two important "ifs", which when not satisfied will give 
big problems in the case of fixed-length codes. We shall comment further on this in 
Section 2.4. 

2.3.1. Non-Singular Codes 

A code C is said to be non-singular if all the codewords in the code are distinct 
[Abramson, 1963]. Both fixed and variable-length codes must satisfy this property in order 
to be useful. This property is trivial to check. 

2.3.2. Unique Decidability 

A code C is said to be uniquely decodable if we can map a string of codewords 
unambiguously back to the correct source symbols. It is obvious that all fixed-length codes 
which are non-singular are uniquely decodable. However, this is not in general true for 
variable-length codes. We will show this with an example. Consider the code {0, 01, 10} 
used to encode the source {a, 6, c}. Clearly this is a non-singular code since all codewords 
are distinct. However, the message ac which is encoded as 010, cannot be uniquely 
decoded since the codeword sequence 010 may either be decoded as or as ba. 
Necessary and sufficient conditions for unique decodability and an algorithm to test these 
conditions are given by Sardinas and Patterson [1953]. Hazeltine [1963] gives an 
alternative algorithm to determine if a code is uniquely decodable. 

A uniquely decodable code C has finite decoding delay T iff there exists an integer T 
such that if x e X*, |x| > T, and xy e C*", then x has a decomposition x = X1X2 such that 
whenever xzgC 1 " then X\ e C and X2Z e C*; in words, iff the first T code symbols in a 
message are sufficient to determine the first codeword. A variable-length code may, for 

33 



Chapter 2 - Variable-Length Error-Correcting Codes 



certain messages, exhibit an infinite decoding delay and hence will not be suitable for 
practical use. Several people have worked on the determination of the decoding delay for 
variable-length codes. The first to give an algorithm to calculate this was Even [1963]. 

A necessary and sufficient condition for the existence of a uniquely decodable code is 
provided by the McMillan inequality [McMillan, 1956]. 

Theorem 2.1: A necessary and sufficient condition for the existence of a uniquely 
decodable code with codeword lengths l\ 9 fc, **•, l s is that 

2>-'<<l (2.2) 

where q is the number of different code symbols. 

Proof. [Abramson, 1963] The sufficient part is proved by construction. From 
expression (2.2) we obtain a series of inequalities on the number of codewords, s/, of a 
given length i 

s L ;<q L °-s*q L °- x -s 2 <q L °- 2 - -s^ x 'q (2.3) 

5 3 * < q 3 - s{q 2 - s 2 'q (2.4) 
sj<q 2 -s x *q (2.5) 
sx % <q (2.6) 
Assuming the codeword lengths satisfy (2.2) and using expressions (2.3)-(2.6) we may 
construct a uniquely decodable code as follows. We require s\ < q codewords of length 
one. Since there are q code symbols then we may choose any arbitrary set of s\ unique 
code symbols as codewords of length one. One way of ensuring that the code be uniquely 
decodable is to enforce that the remaining codewords start with different code symbols, 
thus creating a prefix code (see the next section). Hence there is the possibility of forming 
(q-S\)q codewords of length two. However, expression (2.5) ensures that we do not need 
more than this number, so the construction is possible. In fact, fill the other codeword 
lengths can be constructed in such manner. 



34 



Chapter 2 - Variable-Length Error-Correcting Codes 



For the necessary part of McMillan inequality, consider the expression 

(X>~'') =(<j-' , +q-' 1 + - +q-'-y (2.7) 

where n is some positive integer. Expanding the RHS of equation (2.7) we obtain s" terms 
each of the form q' k , where k is a sum of codeword lengths and can take values from nl\ to 
nl s . Hence 

( * y < 
5>-'< = Y,N k q- k (2.8) 

\M J jt=«/, 

where N k is the number of terms in the expansion of the form q k . But is also the 
number of codewords sequences containing exactly k bits. Hence, for the code to be 
uniquely decodable, this number must be less than q k 9 i.e. 




(2.9) 



Since the inequality given by expression (2.9) must hold for all n, including very large n 9 
then 

1 (2.2) 

■ 

2.3.3. Instantaneously Decodable Variable-Length Codes 

For practical applications, it is required that the decoding delay be as small as 
possible. The minimum decoding delay possible is given when a codeword is decodable as 
soon as it is completely received. Anything less than this would imply that the code is 
redundant. A code with such a property is called an instantaneously decodable code. It is 
obvious that for a code to have this property, a codeword cannot be a prefix of another 
codeword. Hence, these codes are also known as prefix codes. Hence, prefix codes are 
uniquely decodable with decoding delay at most l s (maximum codeword length). 

Interestingly enough, McMillan's inequality given by (2.2) is also a necessary and 
sufficient condition for the existence of a prefix code with codeword lengths /i, fe, /*. In 
this case, it is better known as the Kraft inequality [Kraft, 1949]. Chronologically, the 
proof of this inequality came before that of McMillan's. 



35 



Chapter 2 - Variable-Length Error-Correcting Codes 



2.3.4. Exhaustive Codes 

A code C is said to be exhaustive iff any xe / can be unambiguously decomposed 
into a sequence of codewords ending with a complete codeword or a prefix of a codeword, 
i.e. x = CjjCjy-c^w for some U where c* y e C,j = 1, 2, m and wy e C for some y e A^. 
Note that an exhaustive code is uniquely decodable iff it is also a prefix code. 

2.3.5. Code Efficiency and Redundancy 

Shannon's first theorem [Shannon, 1948] states that the average information of a 
source symbol is H(A) 9 the source entropy, given by 

H(A) = -^/Wlog, > (2.10) 

and that, for a uniquely decodable code, leverage ^ H(A). 
Accordingly, the code efficiency, 77, is defined as 

n-^, (2.1.) 



"^average 



while the code redundancy is defined as 



_L avmge -H(A) 



Redundancy = 1 - 77 = i_l . (2.12) 

^average 

Given a memoryless source A, Huffman [1952] derived an algorithm to construct a 
code with the maximum possible efficiency. Codes constructed using this algorithm are 
known as Huffman codes, which besides having the minimum possible redundancy, are 
also exhaustive codes (and hence have the prefix property [Stiffier, 1971]). 

2.4. Synchronisation 

Synchronisation is of fundamental importance in digital communications. There are 
basically three levels of synchronisation that need to be taken care of. At the most basic 
level the receiver must have phase (in the case of coherent detection) or frequency (in the 
case of non-coherent detection) synchronisation with the carrier wave. The next level of 
synchronisation required is that of symbol synchronisation. This is required at the receiver 
so that the symbol detection interval is accurately aligned to that in the carrier, otherwise 



36 



Chapter 2 - Variable-Length Error-Correcting Codes 



the ability to make accurate symbol decisions will be degraded. In most communication 
systems, an even higher level of synchronisation is required, termed frame synchronisation. 
Loss of frame synchronisation is said to occur when the decoder does not correctly 
determine codeword boundaries. Here, we are only interested in the latter type of 
synchronisation and from now onwards the term synchronisation will be understood to 
mean frame or codeword synchronisation. 

We may consider two types of synchronisation problems. 

1 . At the start of the transmission, the receiver loses the initial channel symbols, and 
hence the decoder does not know where is the start of the first complete codeword 
received. 

2. The decoder is assumed to be already in synchronisation. However, noise on the 
channel causes errors in the symbols supplied to the decoder, possibly resulting in 
loss of synchronisation. There are three types of errors which need to be considered. 

(i) Substitution error (e.g. a 4 0 5 transformed to a ' l 5 and vice-versa). 

(ii) Deletion error (a code symbol in the original message is deleted). 

(iii) Insertion error (an extra code symbol is inserted in the decoded message). 

We can consider case (1) above as initial acquisition of synchronisation. This may be 
treated separately from case (2) since other mechanisms may be brought into play to 
acquire synchronisation, depending on the transmission protocol being used. For example, 
in HDLC (High-level Data Link Control) [Tanenbaum, 1988] a special flag sequence 
(01111110) is transmitted continuously before the start of the actual message to facilitate 
synchronisation. 

On the other hand, acquisition of synchronisation may be considered as a special case 
of (2). In this case, the initial code symbols in the received message may be considered to 
have been deleted. 

On channels without feedback, for instance, the decoder must acquire 
synchronisation, and maintain it, without notifying the transmitter of loss of 
synchronisation. In this case, it is required that the system can automatically regain 
synchronisation. Here, we are only going to consider the latter type of scenario. There are 

37 



Chapter 2 - Variable-Length Error-Correcting Codes 



several schemes one may adopt to achieve this objective, depending if fixed or variable- 
length codes are being used. 

Case (1) applies both for fixed and variable-length codes. However, case (2) is not 
equally applicable for both types of codes. If the decoder is in synchronisation and the 
noise on the channel causes a symbol to be corrupted into another symbol (substitution 
error), then in the case of fixed-length codes, no loss of synchronisation occurs. However, 
this is not so in the case of variable-length codes. A substitution error may cause a 
codeword of length /, to be decoded as a codeword of length l Jt with /, * lj. This will cause 
a loss of synchronisation. Deletion and insertion errors may cause loss of synchronisation 
in both fixed and variable-length codes. 

2.4.1. Synchronisation Schemes using a Marker 

One of the simplest schemes to adopt in order to acquire and maintain 
synchronisation, is to periodically insert a special symbol £ X 9 called a sync pulse or 
marker. Each time the decoder receives this special marker, then this will indicate that the 
next symbol is the start of a codeword. Hence, if the decoder is out of synchronisation, it 
will acquire synchronisation as soon as it receives a marker. To improve the performance, 
it may also be necessary that the energy content for the sync pulse be higher than that for 
the other symbols, to ensure a high probability of detection. The disadvantage of this 
simple system is that of channel efficiency 1 . For instance, if the code is binary, the 
inclusion of a third symbol for the marker will, at best, give an efficiency of 63.1%, even 
for long frames (i.e. infrequent transmission of the marker) [Scholtz, 1980]. 

This scheme is slightly more involved with variable-length codes, since in this case 
the insertion of the sync pulse cannot be periodic. Bedi et aL [1992] suggest inserting a 
synchronisation pulse every n codewords. The decoder then will synchronise every n 
codewords with the help of the synchronisation pulses. Between two synchronisation 
pulses, however, the decoder still may lose synchronisation. The authors suggest using a 



, _ . . . . , _ , Transmission Rate 

1 Channel efficiency is defined as the — : — 

Channel Capacity 

38 



Chapter 2 - Variable-Length Error-Correcting Codes 



decoder which chooses the best ^-codeword sequence among all possible such sequences. 
However, this could become quite impractical for large n. This technique still suffers from 
the same efficiency problem. 

An extension to this idea is to replace the single special symbol marker with a 
sequence of m symbols e X. Using this m-symbol sequence (also referred to as a comma) 
as a marker will now improve the channel efficiency while increasing the complexity of the 
decoder. Again this marker ought to be repeated periodically within a message. Several 
schemes may be employed in this case. For instance, it may be enforced that the marker 
will not appear within the data, using what Stiffler [1971] calls comma codes. However, in 
the case of channels with errors, where we may have the situation that a particular error 
pattern causes a codeword to be transformed into a comma, this requirement may be 
relaxed without much loss in performance. In this case, several occurrences of the comma 
must be observed to maintain synchronisation. This ideally entails the use of fixed-length 
codes in order to maintain the required periodicity. 

2.4.2. Synchronisable Codes 

If the insertion frequency of the comma is such that it occurs every codeword, then 
we have what are called prefixed comma-free codes [Ramamoorthy & Tufts, 1967]. This 
idea was first introduced by Gilbert [I960]. Here, a special prefix p of length l p is used to 
mark the codeword boundaries. Each codeword in a prefixed comma-free code C is of the 
form pw where w is a word over^ 4 " of fixed-length l w . The code is constructed such that p 
will be some distance d (>1) from all / p -bit sub-sequences in C\ In this case the code C is 
said to be synchronisable. 

Definition 2.1: A code C is said to be synchronisable with finite delay Z iff it is 
uniquely decodable and if there exists an integer Z such that if x e |x| > Z, and yxz e 
C 4 ", then x has at least one decomposition x = x^2 such that either yxi e C 1 * and X2Z e C* 
or yxi e C* and x 2 z e C*\ in words, iff the decoder can determine a codeword boundary in 
a sequence of codewords consisting of at least Z code symbols, given that the start of the 
sequence is a suffix of a codeword in C. 



39 



Chapter 2 - Variable-Length Error-Correcting Codes 

Necessary and sufficient conditions for synchronisability and an algorithm to 
determine the synchronisation delay are given by Capocelli [1979], who gives a simplified 
unified treatment for unique decodability, decoding delay and synchronisability of codes. 

For a synchronisable code, as defined in Definition 2. 1 , if one or more code symbols 
are lost (but not a complete codeword), the following two outcomes may be possible. 

(1) . The received sequence is not decodable. 

(2) . The received sequence is decodable, but with the property that correct 

synchronisation is achieved automatically after the first few words are decoded. 

For codes which exhibit property (1) above, we have the advantage that we have an 
indication of loss of synchronisation. In this case, we may simply request a re- 
transmission, if there is a feedback channel. Otherwise, we need to discard symbols from 
the received sequence until the sequence is decodable again, at which point it is hoped that 
synchronisation has been regained. Note that in this case the code cannot be exhaustive. 

For codes which exhibit property (2), we have the advantage that no extra control 
logic is required at the decoder to acquire synchronisation. Consequently, such codes are 
called self-synchronising. However, this is achieved at the expense of losing the ability to 
detect the out of synchronisation condition. 

Note that any given code may exhibit both properties for different sequences. 
However, self-synchronising codes cannot be of fixed-length, since in this case, if the 
decoder is out of synchronisation, unless there is another error it will continue to misplace 
the codeword boundaries indefinitely. 

The prefixed comma-free codes mentioned above are an example of codes with 
property (1). In this case, it is easy to see that the synchronisation delay is l p +l w [Scholtz, 
1980]. 

2.4.2.1. Comma-Free Codes 

Another example of codes which exhibit property (1) are the comma-free codes 
introduced by Golomb et al. [1958], of fixed length L. 



40 



Chapter 2 - Variable-Length Error-Correcting Codes 



Definition 2.2: A fixed length code C is said to be comma-free if for every pair of 
words c, = c tx c v ~Ct L and c, = c h c h ~c jL , c„ c, e C, the words c ik Ci M -»c iL Cj x c h -"C JkmV & = 2, 
3, Z,, are not in C. 

Hence, following this definition, it is easy to see that when the decoder is out of 
synchronisation, the received sequence will not be decodable. To regain synchronisation 
the decoder simple deletes symbols consecutively until it can decode a sequence of 
codewords. In this case the synchronisation delay is 2L-2. Notice that in the original 
definition the code is assumed to be of fixed-length. Scholtz [1969] gives a simple 
construction algorithm for comma-free codes with a maximum number of codewords. In 
this paper, he also extends the notion of comma-free codes to variable-length codes and 
gives a construction procedure for these codes. Again, in the case of variable-length 
comma-free codes, the decoder must scan the received sequence to find the correct 
synchronisation. 

Kendall and Reed [1962] discuss a sub-class of fixed-length comma-free codes called 
path invariant comma-free codes which turn out to be relatively easy to encode and decode, 
and with synchronisation delay of L. However, these codes are not very efficient. 

Levy [1966] extends the idea of comma-free codes to error-correcting codes. If a 
suitable modification vector 2 is added to a cyclic error-correcting code (with sufficient 
redundancy), a comma-free code may be generated. Hence, a simple method can be 
devised whereby the relative decoding simplicity of cyclic error-correcting codes is 
retained, while their synchronisability capability is greatly enhanced, with minimal 
overheads (just the addition of a vector at the transmitter and at the receiver) (see also 
Tavares and Fukada [1969]). For less redundant codes Levy considered a relaxation of the 
comma-freedom required by specifying code parameters [z, x\ Here, z is the maximum 
synchronisation slip considered, and x * s th e minimum Hamming distance of the resultant 



2 A modification vector is a word of the same length as a codeword but which itself is not a codeword. 
When added to all codewords in a code, it modifies the code while retaining the same distance properties of 
the original code. 



41 



Chapter 2 - Variable-Length Error-Correcting Codes 



(out of synchronisation) words, to the codewords in the code. If z is specified equal to the 
codeword length, then % gives the index of comma freedom. 

2.4.3. Statistically Synchronisable Codes 

The codes studied in the previous section are also known as bounded synchronisation 
delay (BSD) codes, since the number of code symbols, z, that need to be observed to 
achieve synchronisation is bounded by Z. Another class of codes which has been studied 
extensively is that of statistically synchronisable codes [Gilbert & Moore, 1959] [Stiffler, 
1971] [Wei & Scholtz, 1980] [Capocelli et aL, 1988]. Here, the synchronisation delay is 
not bounded. However, with probability one, the code will synchronise if a large enough 
number of code symbols is observed; i.e. for statistically synchronisable codes, the 
probability of synchronising after receiving z code symbols is 

limPr{z<Z} = l. (2.13) 

Here it is assumed that the source can generate all possible messages (an e-guaranteed 
message source [Wei & Scholtz, 1980]). 

Capocelli et aL [1988] have shown that a code is statistically synchronisable iff it has 
at least one synchronising sequence. This is a sequence of codewords that is not a sub- 
string of any other sequence of codewords. Hence, whenever this synchronising sequence 
of codewords appears within the message, the decoder will always re-synchronise (hence 
the necessity of having an e-guaranteed source). Capocelli et aL [1988] give an algorithm 
to test whether a code is statistically synchronisable and this is simplified for the special 
case of prefix codes. Neumann [1962a, 1962b, 1964] considers the construction of such 
codes having what he calls a synchronising input sequence. Hatcher [1969] gives error- 
correcting capability to these codes. 

When the synchronising sequence consists of just a single codeword, the code is 
called synchronous [Ferguson- & Rabinowitz, 1 984]. In this case, all that is required for the 
code to be statistically synchronisable is that the synchronising codeword occurs with non- 
zero probability. 



42 



Chapter 2 - Variable-Length Error-Correcting Codes 



A codeword c, = C/,C| 2 - • -c f/ is synchronising for code C if it satisfies the following two 
conditions. 

1. For all codewords c y = Cy,<y 2 ---c^ in C with lj > /,-, in which c, appears as a sub- 
sequence, then c, must be the suffix part of c,. 

2. If a prefix of c, can form part of a suffix of another codeword, then the remaining part 
of c, must form a sequence of codewords. 

In this case, when c, is received without any errors, the decoder will always re- 
synchronise and start decoding correctly from the first codeword following c,. 

Since in the case of statistically synchronisable codes, the synchronising delay is not 
bounded, then it is of interest to know the average number of code symbols which need to 
be observed before synchronisation is achieved. We shall call this the average 
synchronisation delay of the code. Another possible measure for the synchronisation 
capability of the code is the average error span [Maxted & Robinson, 1985], which is the 
average number of source symbols lost during the re-synchronisation process. 

Ferguson and Rabinowitz [1984] argue that to reduce the error-propagation (i.e. to 
reduce the average synchronisation delay or the average error-span) in a code, the code 
must be designed such that the probability of receiving a synchronising codeword is 
maximised. This can be achieved by devising a code with more synchronising codewords 
and/or shorter (more probable) ones. Constructing codes with such a property still remains 
an open problem. 

Note that the presence of synchronising codewords may not necessarily decrease the 
code efficiency. In fact, synchronous Huffinan codes may be designed using a heuristic 
algorithm [Ferguson & Rabinowitz, 1984]. In this case, the synchronous code will also be 
optimal. Deterministic construction algorithms for synchronous variable-length codes, 
which are, however, sub-optimal (with average length slightly larger than that obtained by 
the Huffinan algorithm) are given by Montgomery and Abrahams [1986] and Capocelli et 
al. [1992]. In many cases, the resulting codes have better statistical synchronising 
performance than the corresponding optimal synchronous codes. 



43 



Chapter 2 - Variable-Length Error-Correcting Codes 



Montgomery and Abrahams [1986] have pointed out a further complication to the 
determination of the code with the "best" synchronisation properties. Re-synchronisation 
is attained not only through the occurrence of a synchronising codeword, but also if the 
sequence corresponding to a synchronising codeword occurs somewhere else within the 
message. For example, for the code given in Table 2.1, the codewords 010 and 0110 are 
both synchronising. However, the sequence 010, say, may also occur when ab (0010) is 
transmitted. Hence, whereas the probability of the two synchronising codewords is 0.190, 
the probability that a synchronising sequence occurs is 0.325. In addition, there may be 
other synchronising sequences which are not codewords. 



Source Symbol 


Probability 


c, 


a 


0.28 


00 


b 


0.26 


10 


c 


0.13 


010 


d 


0.12 


110 


e 


0.06 


0110 


f 


0.05 


0111 


g 


0.05 


1110 


h 


0.05 


1111 



Table 2.1 : Synchronous code for eight-symbol source 



Accordingly, Titchener [1988] has questioned the validity of this model for the 
synchronisation recovery of a variable-length code and has proposed an iterative 
construction algorithm for statistically synchronisable codes. The resultant codes are called 
T-codes. The construction is based on the idea that if the starting code is statistically 
synchronising, then by deleting a codeword from this original code and then using this 
codeword to build new codewords, the resultant code will also be statistically 
synchronisable [Titchener, 1984]. The construction is similar to that given by Scholtz 
[1966] for BSD codes. However, Titchener 's construction ensures an exhaustive code. 

Maxted and Robinson [1985] (see also [Monaco & Lawler, 1987]) use a state model 
to derive an expression for the error span of variable-length codes for single bit errors. 
Rahman and Misbahuddin [1989] extend this model to give the performance of variable- 



44 



Chapter 2 - Variable-Length Error-Correcting Codes 



length codes on the BSC. This is further enhanced by Takishima et al. [1994] who give a 
numerical method to quickly evaluate the error span, both for a single bit error and for the 
BSC. They also observed, as pointed to above, that maximising the probability of 
receiving a synchronising codeword does not always produce the code with the minimum 
error span. Consequently, they give a heuristic algorithm for constructing good self- 
synchronising codes better than the ones given by Ferguson and Rabinowitz [1984] and 
Montgomery and Abrahams [1986]. 

2.5- a-Correcting Codes 

In the previous sections we have highlighted some properties of variable-length 
codes. However, with the exception of Section 2.4, the noiseless case was always assumed. 
Even in Section 2.4, we have basically assumed that there is a single error event on the 
channel which causes loss of synchronisation, however in the re-synchronisation process it 
is assumed that no further errors occur. 

The first work found in the literature concerning variable-length codes which are also 
designed to combat channel noise is that done by the Coding Group at Parke Mathematical 
Laboratories from 1957 to 1968. Most of this work is reported by Hartnett [1974]. We 
shall call such codes, in general, variable-length error-correcting (VLEC) codes. 

Here, we shall be dealing mainly with their work concerning channel models in 
which only substitution errors are allowed. This model is one which allows a set of 
permissible error-patterns based on the code being used. This turns out to be a big 
disadvantage when dealing with variable-length codes, as will be shown in Chapter 3. The 
problem here is that they treat VLEC codes as block codes whereas, as will be shown, they 
are really trellis codes, exhibiting a form of "memory". 

In a t-error-correcting block code, all /-bit (or less) error patterns are correctable. If 
the code is not perfect, then we may allow other arbitrary error patterns (with the minimum 
possible weight) to be also correctable [Mac Williams & Sloane, 1978]. For instance, the 
code {00000, OHIO, 10011, 11101} has Hamming distance three [Sweeney, 1991]. This 
implies that it can correct all single bit error-patterns. However, since this is not a perfect 



45 



Chapter 2 - Variable-Length Error-Correcting Codes 



code, it can also correct some double-bit error patterns. From the possible remaining error- 
patterns, one may arbitrarily choose the set {01001, 00101 } with minimum weight. Hence, 
the correctable error patterns for this code are {00000, 00001, 00010, 00100, 01000, 
10000, 01001, 00101 }. This set of error-patterns spans the whole 5-tuple space, and hence 
any 5-tuple may be decoded into one of the four codewords. Hence, as an example, if any 
of the words {OHIO, 01111, 01100, 01010, 00110, 11110, 00111, 01011} 3 are received, 
the decoded codeword will be OHIO. The reason why we choose these as our correctable 
error-patterns is that, over the BSC, these are the most probable ones, and hence by 
choosing these we will be minimising the probability of making a decoding error and thus 
of achieving maximum likelihood decoding. 

Calabi and Arquette [1974a] have adapted this notion to variable-length codes. They 
define an admissibility (or error) mapping a on the codewords of a VLEC code C, such 
that if a received word w e a(c,) c X*, then w will be decoded as c„ where c, € C. They 
also define the expanded code of C under the admissibility mapping a as E = 
\Jf \ [ {a(c j ):c j e C] . Then, in order for C to be decodable in noise (under the a- 
admissibility mapping), E must be uniquely decodable [Calabi & Arquette, 1974b]. This is 
only true if a allows only substitution errors, in which case all the words in o(c/) will be of 
the same length as c, . 

Sato [1979] and Capocelli [1982] have considered something similar, which they 
called multi-level encodings. In particular, they have shown that codeword decomposition 
is not equivalent to decodability. To illustrate this, consider that we have a two-symbol 
source {a, b) and that the code alphabet is ternary, i.e. X- {0, 1, 2}. If the two words 
{0012, 00} are mapped to a and {012, 12012} are mapped to 6, then the expanded code 
{0012, 00, 012, 12012} is not uniquely decodable. However, we can still perform "error- 
correction" under this mapping. As an example, if the message 0012012 is received, this 
could be decomposed either as 0012.012 or as 00.12012. However, both these 
decompositions will result in the decoded message ab. Notice here that the mapping is 



3 For clarity, the corrupted bits are shown in bold. 

46 



Chapter 2 - Variable-Length Error-Correcting Codes 



allowing insertion/deletion errors as well. In this case it will be even more difficult to 
characterise the admissibility mapping required. 

Again assuming that only substitution errors are allowed, the problem still remains of 
how best to define the admissibility mapping. This was relatively easy to do in the case of 
fixed-length codes, because for maximum likelihood decoding we must choose a such that 
those error-patterns with minimum weight are selected, since these are the most likely on 
the BSC. However, this choice of a becomes problematic in the case of variable-length 
codes, even for the simple case of the BSC. Calabi and Arquette [1974a] opted for an 
arbitrary choice, whereby a is a function of the codeword, with a t denoting that the 
mapping can correct all f-bit (or less) error-patterns 4 . In general, t will be a function of the 
codeword chosen. 

Example 2.1: Consider the or-correcting code C2 given in Table 2.2. Then a\(a) = 
{000, 001, 010, 100}, i.e. a\(a) contains the codeword for a plus all words which vary by 
one bit from this codeword. Similarly a 2 (b) will contain the codeword for 6, and all words 
which vary from this codeword by at most two bits. Thus there are 4 words in a\(a) and 37 
words in #2(6). Applying the Sardinas and Patterson [1953] test on the expanded code 
a\(a) u a2(b) shows that this is uniquely decodable, and hence C 2 is a-correcting and will 
correct all single bit errors in a and all single and double bit errors in A. 



Source Symbol 


Codeword 


a 

I b 


000 

00011111 



Table 2.2: An ^-correcting code C2 



There are two main problems with this notion. The expanded code may not always 
be exhaustive, as in the case of Example 2.1. The corresponding property in the case of 
fixed-length codes is that not all codes are perfect. However, in the case of fixed-length 
codes it is relatively easy to either output an erasure in the case of a detectable error-pattern 



In fact they also discuss other mappings suitable for different kinds of channels. 

47 



Chapter 2 - Variable-Length Error-Correcting Codes 



of more than f-bits, or, as indicated earlier, correct some error-patterns with more than /- 
bits. In the case of variable-length a-correcting codes, should an error pattern exceed the 
correction capability of the code, it is difficult to determine which codeword length to 
decode. One possibility is to output erasures until a word in the expanded code is detected. 
However, the error-propagation caused by this loss of synchronisation will have a negative 
effect on the overall performance of the code. The second problem is that the decoding 
delay may be unbounded, as is the case for the code given in Example 2.1, for instance. 
These problems may be partly solved, by enforcing that the code will be instantaneously 
decodable. 

2.6. a-Prompt Codes 

A code C is a-prompt iff for any two distinct codewords c 7 , c 7 , no sequence in a(c,) is 
a prefix of a sequence in a(c,); in other words, iff the expanded code E of C is a prefix 
code. Hence, a-prompt codes are instantaneously decodable (under the mapping a) VLEC 
codes. An example of an ai-prompt code for the 26-symbol English alphabet is given in 
Table A.l which is reproduced, with slight modification, from Calabi and Arquette 
[1974a]. 

Consider the variable-length code C (s\@L\ 9 S2@Li> sj@L a ). Let c, = <W'-c^, 
where c, e C and |c,| = Lj. Then, we define the prefix decomposition of c, to be the set {p /p 
p, 2 , • - p,^}, where p /jfc = c ix c^ - -c lv for all k = 1, 2, • • -,y. In particular, note that p,^ = c,. We 
also define the prefix decomposition of C to consist of the sets {P u Pi, Pa}, where 

2.6.1. Prefix Decoding Algorithm 

Since in the case of ar-prompt codes, the related expanded code is instantaneously 
decodable, then it is much easier to formulate a decoding algorithm for a-prompt codes, 
even including error patterns which are not in the admissibility range or, than it was for the 
case of a-correcting codes. A possible complete decoder for an a-prompt code C (s\@L\ 9 



48 



Chapter 2 - Variable-Length Error-Correcting Codes 



S2@L 2 , "y Sa@L^) with prefix decomposition {P\, P2, *••, Pa) over the BSC is the 
following. 

1. Take the next L\ bits from the received bit sequence and decode these to a prefix 
from the set P\ using the a-admissible mapping. If no word from Pi is found 
satisfying this mapping, then choose that prefix at the minimum Hamming distance 
to the received sequence having the minimum index 5 . Let the decoded prefix be p /r 

2. If pi, g C, then decode the source symbol corresponding to c, = p/, and go to Step 1. 
Otherwise, let j = 2 and go to Step 3. 

3. Take the previous Lj.\ bits considered in the previous steps and the next Lj - Lj.\ bits 
from the received bit sequence (i.e. Lj bits in all) and decode these to a prefix from 
the set Pj using the ^-admissibility mapping. Again, if no word from Pj is found 
satisfying this mapping, then choose that prefix at the minimum Hamming distance 
to the received sequence having the minimum index. Let the decoded prefix be p if 

4. If p* g C, then decode the source symbol corresponding to c, = p (j and go to Step L 
Otherwise, increment j and go back to Step 3. 

We shall call the above algorithm prefix decoding of a-prompt codes. Note that the 
above algorithm always decodes to a codeword, even if the error pattern is not a- 
admissible. 

2.6.2. Segment Decomposition 

Consider the variable-length code C (s\@L\, Si@Li* Sa@L^). Let c, = Cif^-c^ 
where c, e C and |c/| = Lj. Then, the segment decomposition of d is defined to be the set 
{q fl , q, 2 , q^.} where, q,, = c h c h -c iL<i q, 2 = c /jL|+l c^-c^ q iy = c^ +1 c^ +2 -- c^ 
[Bernard & Sharma, 1988] 6 . Hence, c, = q^q/^- q^. We shall define the segment 
decomposition of C to be the sets {Qu Qi, • Qa} where Q } - [J*., ,4^ • Bernard and 

5 The minimum Hamming distance requirement will ensure that the most likely prefix is chosen, whereas the 
requirement that the minimum index prefix is chosen will ensure that the shortest (most probable) prefix is 
chosen. 

6 The definition given by Bernard and Sharma is essentially the same as the one given here. However, in their 
case they use /, instead of Lj for the length of codeword c,. Since in general there will be more than one 
codeword with the same length in the code, then their definition effectively allows for empty segments, 
whereas ours does not. 



49 



Chapter 2 - Variable-Length Error-Correcting Codes 

Sharma [1988] define a /l ,/ 2f ... j/<T -prompt codes that can correct t\ substitution errors in the 
first segment, t 2 substitution errors in the second segment and so on. This admissibility 
mapping is different from the one considered earlier, where the number of correctable 
errors was defined per codeword, whereas here they are defined per segment. 
Consequently the decoding algorithm will be slightly different and will be given in Section 
2.6.3. 

Some combinatorial results are possible using this mapping, which unifies some 
theories in Coding Theory from the areas of noiseless coding and error-correction coding. 

Theorem 2.2: An or, b , 2 ,.. .^-prompt code C (s\@L\, s 2 @L 2 , - •» So@La) must satisfy the 
condition 

2>>J^<1 (2.14) 
i=i 

where |rj^ is the effective range of a codeword, c,, of length Z,,; i.e. the number of words 
in a(c/), given by 

wH<kU--kU, (2 - 15) 
kL -£©•-«>* < 2I «> 

Theorem 2.3: An a /b , 2> ... >/<y -prompt code C (s\@Lu s 2 @L 2y Sa@Lo) exists if 

±s\a 2 ,\ L q^ >\ (2.17) 



and 



Both Theorems 2.2 and 2.3 are proved by Bernard and Sharma [1988]. Theorem 2.2 
reduces to the Hamming sphere packing upper bound in the case of fixed-length error- 
correcting codes, and to the necessary part of Kraft's inequality for the noiseless case. 
Whereas, Theorem 2.3 reduces to the Gilbert lower bound for the case of fixed-length 
error-correcting codes, and to the sufficiency part of Kraft's inequality for the noiseless 
case. Further, Bernard and Sharma [1990] prove the following theorem. 

Theorem 2.4: The average length, Z, ave rage, of an a, !>/2j ... j/a -prompt code C (s\@Lu 
S2@Li, So@Lo) when used to encode a source A with symbol probabilities P(tfi), Pfab), 



50 



Chapter 2 - Variable-Length Error-Correcting Codes 



• « • , P(a s ) is bounded by 



s 




(2.18) 



with equality iff 




for all i 



(2.19) 



where l f is the length of codeword c, and s is the total number of codewords. 

Theorem 2.4 reduces to Shannon's first theorem for the noiseless case. 

Bernard and Sharma [1992] use this per segment error-mapping to define perfect 
a ' lt / 2 ,...,/<rprompt codes. 

Definition 2.3: An <z, lt / 2> .. .^-prompt code satisfying (2.14) with equality can correct 
up to t\ random errors in the first segment and no more, up to t 2 random errors in the 
second segment and no more, and so on for each segment. Such a code is called a perfect 

Q5fiA,.»^-P roin P t code * 

In their paper, they give several examples of perfect a, 1>/2> ... j/ey -prompt codes derived 

from fixed-length perfect codes. Their construction is very simple. Suppose that C\ and 

C 2 are two perfect fixed-length error-correcting codes, capable of correcting up to t\ and t 2 

substitution errors, of length L\ and L 2y and with S\ and S 2 codewords, respectively. Note 

that C\ and C 2 need not necessarily be different codes. Then, a perfect a, lt / 2 -prompt code 

C3 may be constructed by taking s\ codewords from C\ to form length L\ codewords in C3. 

Each one of the remaining codewords in C\ {S\-S\ in all) is then used as prefix to all the 

codewords in C 2 to form (S\S\)xS2 codewords of length L\+L 2 . Thus, C3 will be a (s\@L\„ 

[Si-S\]xS 2 @jL\+L2) perfect a, 1)/2 -prompt code. An example of a perfect ai,i-prompt code 

for the 26-symbol English source is shown in Table A. 1 . This was constructed using the 

(7,4) Hamming code for C\ and the (3,1) repetition code for C 2i both perfect fixed-length 

codes with minimum distance 3. This construction can easily be extended to produce 

VLEC codes with multiple lengths. 



51 



Chapter 2 - Variable-Length Error-Correcting Codes 

2.6.3. Segment Decoding Algorithm 

Since the admissibility mapping is different for the or /b , 2i ... j/o -prompt codes defined by 
Bernard and Sharma [1988], then the decoding algorithm given in Section 2.6.1 for a- 
prompt codes must be altered. 

Assume a variable-length cr /l>/2} . ..^-prompt code C (s\@L\, S2@L 2 , sJ@L^ with 
segment decomposition {Q\,Q2, Qa)- 

1. Lety=l. 

2. Take the next Lj bits from the received bit sequence and decode these to a segment 
from the set Qj using the ^-admissibility mapping. If no word from Q is found 
satisfying this mapping, then choose that segment at the minimum Hamming distance 
to the received sequence having the minimum index. Let the decoded segment be q^. 

3. If qi,qi 2 ***q/y e C, then decode the source symbol corresponding to c, = q^q/y-q/, 
and go to Step 2. Otherwise, increment j (until j < <r, see below) and go back to Step 
2. 

We shall call the above algorithm segment decoding of a tu t 2j ... y t a -prompt codes. 

One problem that may occur in the above algorithm is that a decoded sequence of 
segments q^q,^- •q/, may not form a valid codeword prefix, since in this case the segments 
are being decoded independently. Hence, j will reach <x before any codeword is decoded, at 
which point there are no more segments to decode. One way round this problem is as 
follows. Suppose that q/jq/yq/,., is a valid prefix to a codeword, while qi 1 q/ 2 ** qi, is not 7 . 
Then, instead of decoding q,^ following the above algorithm, this is decoded by choosing 
the minimum distance segment from the list of segments arising only from those 
codewords whose prefix is q^q/y • *q^ r Another possibility is to emit an erasure symbol, if 
incomplete decoding is allowed. However, even here, there is the problem of determining 
the number of segments in the erasure symbol, since this will affect the decoding of the 
subsequent codewords. 



This is always possible at least for j 



= 2, since q ix is always a codeword prefix or a codeword. 
52 



Chapter 2 - Variable-Length Error-Correcting Codes 



2.7. Two-Length Error-Correcting Codes 

Using similar concepts, Dunscombe [1988] defines two-length r-prefix codes capable 
of correcting t bit errors in each codeword. Dunscombe proposes that the length of the 
codewords should be a multiple of each other and conjectures that the best performance is 
achieved when the length of the long codewords is double that of the short codewords. 
These codes are really ar^-prompt codes as defined by Bernard and Sharma [1988]. 
However, the decoding algorithm given by Dunscombe is essentially equivalent to the 
prefix decoding algorithm presented in Section 2.6. 1 . 

Dunscombe also gives a construction algorithm for two-length /-prefix codes which 
is essentially the same as the one given by Bernard and Sharma [1992] to construct their 
perfect tff,,/ 2 ,... ^-prompt codes. The only difference is that the starting codes C\ and C2 are 
really the same code (called the base code) and this is not necessarily perfect. The base 
code must be a f-error-correcting code, of fixed length L. If the construction procedure 
outlined in Section 2.6.2 is followed exactly, then Dunscombe calls this a complete derived 
code; otherwise, if during the construction not all codewords of C\ are used as short 
codewords or as prefix of long codewords, or if not all codewords of C2 are used as 
suffixes to all the chosen prefixes, then these are simply called derived codes. Dunscombe 
also defines a special class of codes called fixed-ratio derived codes which put some 
restrictions on the choice of codewords (for details refer to [Dunscombe, 1988]). 

Dunscombe also derives an expression for the expected number of Z-bit blocks that 
are out of synchronisation given that a synchronisation error occurs, both for complete and 
fixed-ratio derived codes. This result is improved upon by Escott [1995] which gives a 
similar expression but in terms of the number of codewords (and hence source symbols). 
Escott also gives an expression for the probability of symbol error. 

2.8. Instantaneous Decoding using the Massey Metric 

Both decoding algorithms given in Section 2.6 are instantaneous, in the sense that as 
soon as a codeword is completely received, it will be decodable. This was in fact the main 
objective of a-prompt codes. However, the Hamming metric used in both these algorithms 



53 



Chapter 2 - Variable-Length Error-Correcting Codes 



is by no means optimum. The main problem with these algorithms is that notwithstanding 
the fact that these are supposed to be instantaneous codes, in the case of channels with 
errors, even if these algorithms may correct some error-patterns it does not necessarily 
follow that the decoding process is optimum. The main reason for this is that, should a 
decoding error be made on the current codeword, this will have knock on effects on 
subsequent decodings, due to loss of synchronisation. So obviously, once there is a 
synchronisation error, there will follow a relatively large number of subsequent codewords 
that are also erroneously decoded, even if in the meantime there were no further errors on 
the channel. Hence, although these codes may be decoded instantaneously, it is clear that if 
we were to wait for several codewords before making a decoding decision, better 
performance 8 should be expected. Such a decoding algorithm will be given in Chapter 3. x 

However, one step forward in this direction was indirectly suggested by Massey 
[1972]. In this paper, he derives what he calls "the required statistic for minimum-error- 
probability decoding of variable-length codes". As we shall see in Chapter 3, this is only 
true for instantaneous decoding of VLEC codes. That is, only if VLEC codes are 
considered to be pseudo-block codes. 

In his paper, Massey converts the variable-length code C (s\@L\ 9 S2@Li, sj@JL^) 
into a fixed-length code by adding a "random tail" to each codeword, such that each 
resultant word has length L 9 where L = L a is the length of the maximum length codeword. 
Massey assumes that the random tail is selected statistically independently of the codeword 
and that the bits in this tail are chosen independently according to a probability measure 
Q{ ) over the channel input alphabet. In fact, these random tails consist of bits resulting 
from subsequent codewords in the encoded message. Hence, the assumption that the bits 
in the tail are independent is not exactly true and here it will be dropped. 



Better in the sense that less decoding errors will be made. 

54 



Chapter 2 - Variable-Length Error-Correcting Codes 



Quoting from Massey's results [Massey, 1972] 9 , the joint probability of sending 
message (codeword) c OT and receiving y is given by 

Pr(c OT ,y)= P m t\P{y,\c m )Y[p a {y lm+J ) (2.20) 

J = 1 jss\ 

where, P 0 {y,) = *ZP{y,\t k )Q{t k ) • (2-21) 

Here, y = yxyi'-yi is the received L bit sequence; P m is the probability of sending 
codeword c m = c mx c mi --c mi \ l m is the length of c m ; and t% is the bit at position k in the 
random tail. Hence, given y, the optimum decoding rule using this model is to choose m f 
as the value of m which maximises Pr(c OT ,y). We shall call the metric given by equation 
(2.20) the Massey metric. Note that Massey goes on to derive the Fano metric from 
equations (2.20) and (2.21), used in sequential decoding of convolutional codes (see for 
example [Lin & Costello, 1983]). 

Removing now Massey's assumption, let T m be the set of possible tails for codeword 
c m with length l m . Then, T m — {Cj^c^* "Cij^CjjCjj^ "Cj^ \ c /19 C/ 2 , C/^, %%"'^i jM ''' Ci j t GC9 
and //, + // 2 + — + l tjA + k — L - l m }, 10 i.e. T m is the set of all codeword combinations 
followed, possibly, by prefixes of codewords, with total length equal to X-/ m . The 
probability distribution of T m will then be simply derived from the probability of the 
constituent codewords. 

Using the above probability distribution for the "random" tail and assuming that the 
code is binary and that the channel is the BSC with cross-over probability /?, equations 
(2.20) and (2.21) become 

Pr(c„,y) = P m p h " (1 - p)'-"*- P 0m (2.22) 
where, P 0m = ^(^(l-p) 4 "^ . (2.23) 

Here, h m is the Hamming distance between the first l m bits of y, y\yr- -yi m , and c m ; 
P(ti) is the probability of a particular tail, t„ for codeword c m ; and h tf is the Hamming 
distance between the last L - l m bits of y, y\ m +\yi m vi'~yL<> and t,. The metric derived from 

9 With slight change of notation. 

10 As usual, l tj ~ \c fj \: 



55 



Chapter 2 - Variable-Length Error-Correcting Codes 



(2.22) will be called the modified Massey metric. The following example will clarify some 
of the points discussed above. 

Example 2.2: Consider code C3 given in Table 2.3. Since there are two codeword 
lengths in this code, then there are two "random" tail lengths. In practice there is only one, 
since the tail for the longer codewords is of length 0. Hence, the possible tails for both the 
short codewords are derived from the prefix of all codewords, of length 3. These possible 
tails for a and 6, together with their probabilities are given in Table 2.4. 



Source Symbol 


Probability, P(Ai) 


Codeword 


a 


0.35 


00000 


b 


0.30 


10110 


c 


0.10 


11001111 


d 


0.10 


01111111 


e 


0.05 


11011010 


f 


0.05 


01101010 


g 


0.03 


01011001 


h 


0.02 


11101001 



Table 2.3: An eight-codeword a\ -prompt code C3 



Possible tails for a and b 


Probability j 


000 


0.35 


101 


0.30 


1 011 


0.15 


110 


0.15 


010 


0.03 


111 


0.02 



Table 2.4: Possible tails for code C 3 



Suppose that the bit sequence 10011110 is received, and let p = 0.01. Then, for the 
two short codewords, Pq is given by 

P 0 = [(0.35 + 0.3 + 0.1 5) x 0.0 1 2 x 0.99] + [0.1 5 x 0.99 3 ] + [(0.03 + 0.02) x 0.0 1 x 0.99 2 ] 
= 0.1461 

Whereas for the long codewords, Pq = 1 . 

Using equation (2.22), the calculated Pr(c OT , y) for all possible codewords and the 
Hamming distance to the received bit sequence are given in Table 2.5. From this table we 

56 



Chapter 2 - Variable-Length Error-Correcting Codes 



can immediately deduce that the most likely symbol is e, even though the codeword for b is 
also at a Hamming distance of 2 and b is more likely to occur than e. However, in this case 
the tail 1 10 is not very likely and this tips the balance in favour of e. 



Source Symbol 


Hamming Distance 


Pr(c m ,y) 


a 


3 


5.01x1 0 -8 


b 


2 


4.25x1 0" 6 


c 


3 


9.51X10' 8 


d 


4 


9.61x1 0 10 


e 


2 


4.71x10^ 


f 


5 


4.85x1 0 -12 


g 


5 


2.91x1 0-' 2 


h 


6 


1.96X10 14 



Table 2.5: Probabilities for all codewords of C3 given that we receive 1001 1 1 10 

Hence a third instantaneous decoding algorithm for prompt VLEC codes may now be 
given. This will be called the tail decoding algorithm for a-prompt codes. 

1 . Let y be the next L bits of the input bit sequence (where L is the maximum codeword 
length of the VLEC code Q. 

2. Calculate Pr(c m , y) for all codewords c m in C. 

3. Decode codeword c m > such that Pr(c m ', y) > Pr(c m , y) for all m=l, 2, s, where s is 
the number of codewords in C. 

4. Dump the decoded l m * bits from the input sequence and repeat from Step 1 . 

2.9. Symbol Error Probability 

When assessing the performance of any error-correcting code, the output error 
probability is always of interest. In general, error-correcting codes are used to give 
protection to data which has already been source coded, and hence this error probability is 
rarely given in terms of the actual source symbols being transmitted, but rather as a bit 
error probability (assuming the code is binary). Since variable-length codes are used to 
perform source coding, then in order to judge the performance of VLEC codes we must 
determine the symbol error probability, i.e. the probability of decoding a transmitted source 

57 



Chapter 2 - Variable-Length Error-Correcting Codes 



symbol in error. Unfortunately, this is not as straight forward to calculate as was the case 
with fixed-length error-correcting codes. The problem lies in the fact that the number of 
decoded source symbols may be different to the number sent. This occurs because of the 
synchronisation losses in variable-length codes, even for the case when the channel does 
not insert and delete code symbols itself. 

2.9.1. Levenshtein Distance 

Suppose that the transmitted message is bacdabd and that the decoded message is 
baaadabd, where a decoding error has been made with c being decoded as aa. So clearly 
in this case, the number of erroneous symbols in the decoded message is two. However, if 
we were to calculate the Hamming distance between the two strings, as is usually done for 
fixed-length codes, we encounter two problems. First, the two strings are not of the same 
length. We can solve this by truncating the longer string to make the two equal in length. 
However, the Hamming distance still will not give the correct number of errors. For 
instance, in our example the Hamming distance is five. Clearly, once the number of 
decoded symbols is not equal to that transmitted, the Hamming distance criterion will give 
a totally meaningless result. 

When dealing with codes capable of correcting deletion and insertion errors, 
Levenshtein [1965] introduced a new distance measure. We shall use this distance measure 
in order to define the symbol error probability in the case of VLEC codes. However, first 
we shall define the Levenshtein distance. 

Definition 2.4: Let ai and a 2 be two sequences of source symbols over A not 
necessarily of equal length. Then the Levenshtein distance between the two sequences, 
denoted by L(&u a2>, is the minimum number of insertions, deletions and substitutions 
necessary to transform one sequence into the other 1 1 . 

It should be appreciated that the Levenshtein distance is not as easily computed as the 
Hamming distance. Kruskal [1983] gives an algorithm to compute the Levenshtein 



11 Although Z,(ai, a 2 ) = £(a 2 , aO, we will always take the first listed sequence as the reference sequence and 
the terms insertion, deletion and substitution are with reference to this sequence. 



58 



Chapter 2 - Variable-Length Error-Correcting Codes 



distance based on the principle of dynamic programming. If the two sequences are of 
length rt, then the complexity of this algorithm is 0(n 2 ). Masek and Paterson [1983] give 
an improved algorithm for long sequences with complexity 0(n 2 /log «). However, both 
algorithms will take an appreciable time to compute the Levenshtein distance of sequences 
consisting of a few tens of thousand symbols and hence are not practical to use when 
message lengths of a few hundred thousand symbols are considered in simulation runs. A 
more practical algorithm is given in the next section. 

Having defined a distance measure between two unequal-length sequences, it is now 
quite easy to give a definition for the symbol error probability. 

Definition 2,5: The symbol error probability (SEP) of the decoded source message a r 
when compared to the transmitted source message a t is given by the ratio Z(a,, a r ) / |a,|, 
where |a,| denotes the number of source symbols in a r . 

2.9.2. A Practical Algorithm to Evaluate the Symbol Error Probability 

In this algorithm we are going to assume that we know the codeword lengths mapped 
to each source symbol and that the channel does not insert and delete bits (such as the 
BSC). Both of these assumptions are satisfied for most of the work presented in this thesis. 

Let the transmitted source message a, = a tl a t2 -~a iT be encoded using code C as 
c h c t2 -'-c tT and let the decoded codeword sequence be c ri c r2 ---c r/? which is mapped to the 
source message a r = a r] a ri '-a rR . As usual, let /, = |c,-|, where c, e C. Then, the following 
algorithm will calculate an approximate value for Z,(a,, a r ). 

1. Let ij= 1 and£rf=0. 

2. If / > T, then increment Ld by R-j+l and stop. 

3. ]fj>R 9 then increment Ld by 7W+1 and stop. 

4. If a tl = a r . then increment / and j and repeat from Step 2, otherwise go to Step 5. 

5. If |C/J = |c ry | then increment i 9 j and Ld and go back to Step 2, otherwise proceed to 
Step 6. 



59 



Chapter 2 - Variable-Length Error-Correcting Codes 



6. Let ai = a tt a ti + x ~ and a 2 = a r ja rj+l " *a rj + v such that u and v are the smallest integers 
which satisfy |c // c / ^ I ---c w | = |c r/ c ry+I -- c r/ J or H-w = T and y+v = R. Increment Ld by 
L(j3L\ 9 &2) and i 9 j by «+l and v+1 respectively. Go back to Step 2. 
At the end of this algorithm, Z(a,, a r ) « Zrf. However, note that in any case Z<i > Z(a,, 
a r ). This is a direct consequence of Definition 2.4. 

The algorithm given above is much faster, since essentially it is evaluating the 
Hamming distance between those parts of the sequences which are in synchronisation 
(since here there are no insertions or deletions of symbols) and only uses the Levenshtein 
distance between those parts of the sequences which are out of synchronisation. In general, 
these tend to be relatively short sequences and hence the complexity of calculating the 
Levenshtein distance in this case is much reduced. 

Having obtained an approximate value for Z(a f , a r ), then an approximate value for the 
symbol error probability is easily obtained. However, we should point out that it seems to 
us that this method reflects better what is really happening in practice and hence is a more 
realistic value for the symbol error probability. Having said this, it seems appropriate to 
point out that using simulation results we were able to confirm that in practice the 
difference between the approximate method and the exact method is negligible for all 
codes considered. 

2.10. Synchronisation-Error-Correcting Codes 

Another class of codes which was considered in the literature is that of 
synchronisation-error-correcting codes, i.e. codes which are capable of correcting insertion 
and deletions (and possibly substitution) errors. As may be appreciated, these types of 
errors are much more difficult to correct then just simple substitution errors. Consequently, 
all the codes designed to combat these types of errors are, to our knowledge, of fixed- 
length. 

Sellers [1962] was the first to consider this type of codes and gave a construction for 
fixed-length codes capable of correcting single bit synchronisation errors. His code was 
constructed by inserting special symbols into a burst error-correcting code. Ullman [1966] 



60 



Chapter 2 - Variable-Length Error-Correcting Codes 



also gives a single synchronisation-error-correcting code. Levenshtein [1965a, 1965b, 
1966] investigates some properties of synchronisation-error-correcting codes and proposed 
a binary code capable of correcting / deletions or insertions of ones. Ullman [1966] gives 
upper and lower bounds on the redundancy required to correct synchronisation errors. A 
family of codes which can correct some substitution and synchronisation errors is given by 
Calabi and Hartnett [1969a], The authors also give some further characterisations for these 
codes [Calabi & Hartnett, 1969b]. Tanaka and Kasai [1976] derive some sufficient 
conditions, based on a generalised Levenshtein distance, for codes to correct 
synchronisation errors and a method for constructing such codes. Recently, Hollmann 
[1993] gives some further characterisations by separating the insertion and deletion errors 
for synchronisation-error-correcting codes. 

2.11. Conclusion 

We have seen that variable-length codes are more difficult to deal with than fixed- 
length codes even for the noiseless case. Here, all that is required for a fixed-length code 
to be useful is that it be non-singular. Variable-length codes, however, need also to be 
uniquely decodable. To minimise the decoding delay, they must also be instantaneously 
decodable. 

For channels which admit insertion and deletion errors, we have seen that we need to 
design into both fixed and variable-length codes special characteristics which will enable 
the decoder to recover synchronisation. In the case of variable-length codes, these 
synchronisation problems are present even with only substitution errors, which in practice 
are more common. Hence, for variable-length codes used over a noisy channel there must 
always be some mechanism which enables the decoder to re-synchronise. This problem 
can be minimised if the variable-length code is protected by using an error-correcting code, 
which effectively would yiel&a noiseless channel for the variable-length code. Statistically 
synchronisable variable-length codes may also offer good performance on a noisy channel. 
The property of self-synchronisation in some variable-length codes is attractive even for 



61 



Chapter 2 - Variable-Length Error-Correcting Codes 



channels with insertion and deletion errors, since no further control logic is required to 
maintain synchronisation unlike for the case of fixed-length codes. 

Finally, we have introduced variable-length error-correcting codes, which have in- 
build additional structure to correct substitution errors, and hence improving also their 
synchronisation properties. A construction algorithm for perfect or-prompt codes was 
given which could also be used to construct other or-prompt codes. Three instantaneous 
decoding algorithms for the class of a-prompt codes were also given. All three of them, 
with the possible exception of the tail decoding algorithm, consider the VLEC codes as 
pseudo block codes. This limits the performance of these codes as we shall see in the next 
chapter. Note that a performance comparison for these algorithms is deferred until Chapter 
3, where a maximum likelihood decoding algorithm for VLEC codes is derived. 



62 



Chapter 3. 



Trellis Structure of Variable-Length Error- 
Correcting Codes 



3.1. Introduction 

In this chapter we present a novel way of decoding VLEC codes by treating them as 
trellis codes. Consequently it is shown that in many respects they behave very much like 
convolutional codes and exhibit a form of "memory", although in the case of VLEC codes 
the memory is not related to an encoding shift register, but is due to the different codeword 
lengths. 

Using this representation, we give a maximum likelihood decoding algorithm based 
on the Viterbi algorithm and also derive a maximum a-posteriori (MAP) metric for use 
with this algorithm. 

There are other properties of VLEC codes similar to those of convolutional codes, 
like their free distance, constraint length and catastrophic behaviour, which are dealt with 
here. 

In this chapter we also give two methods, namely using union bounds and computer 
simulation, to determine the performance of these codes on the BSC channel, and we 
compare these two methods. We also compare the performance of VLEC codes with 
maximum likelihood and MAP decoding. Finally, we do comparisons with the decoding 
algorithms presented in Chapter 2 and show that maximum likelihood decoding of VLEC 
codes achieves an appreciable coding gain over the instantaneous algorithms. 



63 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

The tree and trellis representations for VLEC codes appeared first in [Buttigieg & 
Farrell, 1993a], whereas the maximum a-posteriori metric for VLEC code was first derived 
in [Buttigieg & Farrell, 1993b]. Comparisons between maximum likelihood decoding for 
VLEC codes and instantaneous decoding were presented in [Buttigieg & Farrell, 1994b]. 

3.2. Maximum Likelihood Decoding 

Let C be a binary VLEC code (s\@Zu si@L2, mmm * s<j@Lo) as defined in Section 2.2. 
Under noisy conditions, the decoder for C has two main problems. First to determine the 
codeword boundaries, and secondly to determine the codeword values. These two 
problems must be solved simultaneously in a maximum likelihood decoder. Hence, if we 
are to implement a maximum likelihood decoder, we must use a representation for VLEC 
codes which retains both the spatial and amplitude information. This can be achieved by 
using a tree structure. 

3.2.1. Tree Structure 

In this representation, the root node of the tree represents the start of the message. 
Each node in the tree is connected to s other nodes. The s branches connecting these nodes 
are each labelled with a different codeword of C. 

Definition 3.1: A path p { through the tree is any sequence of branches (codewords) 
connecting the root node to some other node /. 

Definition 3.2: The span of the path p i9 denoted by is the number of branches 
(codewords) in the path. The length of the path p u denoted by [p,|, is the total number of 
bits in the path. 

The nodes in the tree are organised in such a way that all nodes with equal path 
lengths fall on the same vertical line (assuming the tree grows horizontally). The first tree 
segment for C is shown in Figure 3.1. This is also the basic building block for the tree, 
because this structure is repeated at each new node generated. 



64 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 




0 Li L 2 -L a 

Figure 3.1 : First tree segment for a VLEC code C 

Example 3.1: Consider the VLEC code C4 given in Table 3.1. For this code <r= 2, 
s = 3, s\ = 1, S2 = 2 and L\ = 3, L2 = 4. The tree diagram for this code, up to a maximum 
path length of 12 bits, is shown in Figure 3.2. 



Source Symbol 


Codeword 


a 


000 


b 


0110 


c 


1011 : 



Table 3.1: VLEC Code C 4 



Now, assume that the channel does not insert or delete channel symbols (bits), such 
as the BSC for the binary case. Then, knowing the number of received bits N 9 the decoder 
can determine the set of possible paths P N ={Pr\ Pi\= N) through the tree, which 
corresponds to all the nodes on the same vertical line at bit position N. Then, with 



65 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



probability one, the transmitted path, p m ePtf. Hence, a maximum likelihood decoder need 
only consider the sub-set Pn of all the paths in the tree. 

Let the codeword sequence associated with the path p^Pn be (c lp c, 2 , c, 7 ) and let 
f,= C/jC^-'-c/^, where rjj = Then, by definition, |f,| = N. Hence, the set 




000 



0110 



1011 



Branch labels 



0 34 6789 10 11 12 Path length (bits) 

Figure 3.2: Tree diagram for code C4 up to length 12 bits 



66 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

F N = {f,:|fj= N) has a one-to-one mapping to Px given by /?, «-» f/. 

Definition 3.3: The set F N = {t g :\t g \= N} , f, = c^-c^ c, ; eC Vy = 1, 2, tj h is 
the extended code of the VLEC code C of order JV. 

Note that the extended code Fx is a fixed-length code with codeword length equal to 
N. Assuming that all codewords of F N are equally probable (but see Section 3.3 below), it 
can easily be proved that for maximum likelihood decoding of Fx over the BSC, we need 
to choose that codeword which is at the minimum Hamming distance to the received bit 
sequence [Sklar, 1988]. Therefore, since the transmitted sequence of codewords is in Fx 
with probability one, then by performing maximum likelihood decoding on F N we would 
also be achieving maximum likelihood decoding of the associated VLEC code C. 

3.2.2. Trellis Structure 

In the previous section we have derived a maximum likelihood decoding algorithm 
for VLEC codes using the tree structure. However, one problem with this structure is that 
it grows exponentially with increasing N. To overcome this problem, we reduce the tree 
structure to a trellis structure in a similar manner as is done for convolutional codes [Clark 
& Cain, 1981]. We can achieve this by combining all nodes on the same vertical line (i.e. 
all the nodes with paths of the same length) into one state in the trellis. We shall prove in 
Section 3.2.2.2 that the trellis structure so constructed can still be used to perform 
maximum likelihood decoding of VLEC codes. 

Definition 3.4: Let S t represent the state in the trellis diagram for the VLEC code C 
corresponding to bit position i within the encoded message. Then, a path pj in the trellis 
is a particular choice of transitions to traverse from state So to some other state Si, where j 
is an arbitrary path index, since there may be more than one path to state 

It should be obvious from the above definition that \pf\- i. Note that a path may be 
characterised in two equivalent ways. It could either be given as a sequence of source 
symbols, which would constitute the source message, or equivalently, it could be given as a 
sequence of codewords, which would constitute the encoded message. 



67 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



Definition 3.5: A path pf = (c*,, c kj , Cjt % ) is extended into ^ new paths, /t/^ , p]^ , 
• • 77/4^ , by adjoining to it all possible codewords in the code C, i.e. pfl^ ~ (c*,, c* 2 , • • - , c* % , 
ci), P& 2 = (c* P c* 2 » — f Cit v c 2 ), — , (c* p c kv c*^ c 5 ). 

Hence the number of transitions 1 (source symbols or codewords) in the extended 
paths is one more than that in the original path. There are anew distinct destination states 
in the extended paths, SWz,,, S/+l 2 , Sr+z CT , respectively. 

Definition 3. 6: Two states S, and 5), with 1 < j\ are said to be consecutive iff there is 
no state such that i < k < j. 

3.2.2.1, Trellis Construction Algorithm 

1 . Let So be the first state in the trellis. 

2. From each existing state Sj emit transitions for each of the codewords in C with 
destination states Si+x, P Sj+L 2 , • • 5^ a . If any of these states do not already exist, then 
these are created, otherwise the transitions are made to existing states. 

3. The states are grouped into stages, each stage consisting of consecutive states starting 
with Sjl^J = 0, 1, 2, • ■ •, where a stage of the trellis is defined to be any vertical group 
of states within the trellis. 

Note that step 3 is not essential for the decoding algorithm as will be given in Section 
3.2.2.2 and is only required to force transitions to be either within the same stage or from 
the previous stage only. In this case, the number of states in a stage is directly related to the 
constraint length of the VLEC code, as we shall see in Section 3.4.2. 

Example 3.2: Again, consider code C4 given in Table 3.1. The trellis diagram for this 
code is shown in Figure 3.3 up to bit position 15. Notice that same length codewords give 
rise to parallel transitions. Compare this with the equivalent tree diagram of Figure 3.2. 
Whereas the trellis starts to repeat itself after state S9, the tree diagram continues to grow 
exponentially. 



Note that the terms 'transition' and 'branch' will be used interchangeably throughout this text. 

68 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 




Figure 3.3: Trellis diagram for code C4 



Notice that the states in Figure 3.3 are arranged in such a way that the transitions are 
either within the same stage or from the previous stage only, as enforced by step 3 in the 
construction algorithm. 

An alternative trellis diagram for code C4 is shown in Figure 3.4. Here, the fact that 
each state in the trellis is identical is being emphasised. Note that Figures 3.3 and 3.4 are 
equivalent and their only difference lies in the way in which the states are laid out. 

3.2.2.2. Modified Viterbi Algorithm 

We shall now give a modified version of the Viterbi decoding algorithm [Viterbi, 
1967] for VLEC codes. A decoding algorithm for variable-length trellises, as used for run- 
length codes, is also discussed by Belongie and Heegard [1993]. 

Let y = yiyr-yN be the received Af-bit sequence and denote the metric of the 
surviving path at state 5/ by M>. 

000 



0110 




1011 ' ^ \ \ ^ ' X. ^ X. 



Transition labels 

Figure 3.4: Alternative trellis diagram for code C 4 



69 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



1 . Assign Mo = 0 and Mi ■ = -co, V i > 0. Let S f denote the current state and initially put 

; = o. 

2. For all codewords cj e C evaluate the branch metric rtij = -H(cj, ^m-^i). Flag 
Si+ij as a visited state. If ny + Mi > Mw y then store this codeword for the transition 
Sr^Sj+ij (overwriting any other previously stored transitions to state St+ij) and make 
Mj+ij = rtij + M. 

3. Increment / to the next visited state and repeat step 2 until i>N-l\. 

4. Decode the message corresponding to the codeword sequence represented by the 
surviving path to state Sm. 

Theorem 3.1: The modified Viterbi decoding algorithm given above achieves 
maximum likelihood decoding for VLEC codes (assuming all paths are equally probable; 
but see Section 3.3 below). 

Proof: In step 4 of the algorithm above, we are decoding the surviving sequence of 
codewords to state S N . Hence, the number of bits in the decoded codeword sequence is 
equal to N and therefore the decoded sequence is a codeword of Fn. Now we need to prove 
that it is also that sequence with the minimum Hamming distance to y. 

Suppose that the minimum distance path to state S Ni p™ is eliminated at some 
previous state Sj 9 as shown in Figure 3.5. This implies that the distance between the 
received bit sequence and p% at state Sj is greater than that for the surviving path 2 . Now, 
consider that we follow the surviving path at state Sj with the remaining part of p™ from 
state Sj to state Sn- Then, in this case, the distance for the complete surviving path will be 
less than that for the minimum distance path, p™ , which is a contradiction. Hence, the 
surviving path at state Su must be the one with minimum distance to the received bit 
sequence (i.e. the one with maximum metric) and this is the maximum likelihood path. ■ 



2 Notice that due to the negative sign in front of the Hamming distance in step 2, the maximum metric path is 
the path with the minimum distance. 



70 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 




Figure 3.5: Proof of maximum likelihood decoding 

3.3. Maximum A-Posteriori Metric 

In Section 3.2 we assumed that all the paths to state Sn were equally probable. In 
practice this is not true. Even if the codewords of C occur with equal probabilities, which 
again is in general not true for VLEC codes, the paths to state S# may have different 
probabilities. The reason for this is that these paths may have a different number of 
codewords. In Theorem 3.2 we drop this assumption to obtain a maximum a-posteriori 
(MAP) metric for VLEC codes. 

Theorem 3.2: For MAP decoding of VLEC codes, the branch metric ny between 
states Si and Sh-Ij used in the modified Viterbi algorithm is given by 

m j = [log p - log(l " P)]h(cj , y § y M - • - y i+lj _ x ) + log />(c,) 

where,/? is the cross-over probability of the BSC, H^Cj^y^^-y^^ is the Hamming 
distance between codeword c, and the corresponding received bits starting at bit position /, 
and P(eJ) is the probability of occurrence of c 7 . . 

Proof. Let y denote the received TV-bit encoded message and f \ e F N , V j = 1, 2, 
f N , where Fn is the extended code of order N of the VLEC code C and fx is the cardinality 
of Fx. For MAP decoding, we need to choose that codeword f m such that 

P(f m \y)>P(f i \y) (3.1) 

V i = 1,2, -~,f N and \ <m<fy. 



71 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



Now, for the BSC, we have 

PW) = P h 'V-p) N - h ' (3-2) 
where, p is the cross-over probability of the BSC and h ( = H(y, f,). 
Also, using Bayes' theorem [Papoulis, 1965], we have 



_ P( y\f m )P(tJ (3 3) 



£p(y|f,)P(f,) 



where, P(f,) is the probability that f, is transmitted. 

Substituting for P(y|f,) and P(y|f m ) in equation (3.3), we have 

W. P1=*£X£L . (3.4, 

i=l 

Hence, by expression (3.1), for MAP decoding we need to choose f m so as to 
maximise equation (3.4). Now, the denominator of (3.4) is constant for all f m . Hence, to 
maximise P(f m |y), we only need maximise the numerator of (3.4). Instead of maximising 
P($m\y\ it is more convenient to maximise its logarithm. This can be done since the 
logarithmic function is monotonically increasing. Therefore, we need to choose f w to 
maximise 

log[/>Ml-/^^ ( 3 - 5 ) 
In addition, since N log(l-/?) is the same for all f m , then we only need to choose f m so 
as to maximise 

^[logp-lo^l-^J + log^fJ . (3.6) 
The metric given in expression (3.6) is for complete paths to state SV- In order to 
apply this metric in the modified Viterbi decoding algorithm as given in Section 3.2.2.2, we 
need to modify it for single transitions in the trellis, which represent single codewords. 

Let f, be made up of 77, codewords of C, i.e. f, = ^ft*"C/ tf where c,-. is the jth 
codeword of the ith path to state Sn. Define the decomposition of y with respect to f, to be 
y = y/i>Vyv w ^ ere ly(,l ~ l c (,l for ally = 1, 2, rj h Note that y /y . represents that part of 



72 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

the received bit sequence which is compared with the yth transition of the ith path to state 
SW. It is very easy to see that 

Also, assuming that the source is memory-less, then the probability of f„ P(fl) 9 is 
given by 

p( f/ ) = p^y^yp^) 

=>logP(f,) = £logP(c ). (3.8) 

Therefore, substituting for h m and P(f m ) in the metric given by expression (3.6), for 
MAP decoding we need to maximise 

[log p - log(l - pjfc H(y mj , S ) + £ log PU ) . (3 .9) 

Hence, we need to choose at each state, that codeword c mj which maximises the 
running sum of expression (3.10). 

[log i 7-log(l- J p)]/f(y OTy , S )+logi>( S ) (3.10) 

Therefore, for MAP decoding the branch metric ny used in the modified Viterbi 
algorithm given in Section 3.2.2.2 (step 2) is modified to 

mj =[logp-log(l-/»]^ (3.11) * 

■ 

If P(c t ) = P(cj) and |c,| = |c 7 |, V c„ c, e C, then the MAP metric given by equation 
(3.11) reduces to ny = [log/?-log(l -p)] H(c j9 y&i+v ~yi+i r \) since in this case the term 
y^_i fog P{ c m s ) ^ n expression (3.9) will be a constant and hence may be ignored. If we 
assume that 0 < p < 0.5 3 , then [log/? - log(l - p)] < 0. Hence in this case mj = -kH(cj, 
yyi+v-yt+ij-x) where k is a positive constant for a given p. Since /c is a constant for all 
codewords, then it can be ignored and we have mj = -H(c J9 y{y i+ \ • * -yn-ij- 1). This is the same 
as the maximum likelihood metric given in Section 3.2.2.2 since the two conditions given 
above ensure that all paths through the trellis to state SV are equally probable. 

3 We can always assume that p is within this range, since if it is greater than 0.5 then we can simple invert the 
0's and 1 's at the output of the BSC. 



73 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

3.4. Some Properties of VLEC codes 

In the previous sections we have established that we can consider VLEC codes to be 
trellis codes, their "memory" arising not from some storage element, but from the spatial 
information. Consequently, the properties of VLEC codes should be similar to those of 
trellis codes or convolutional codes, which are a special class of time-invariant linear trellis 
codes. 

3.4.1. Free Distance 

One of the most important parameters for convolutional codes when using maximum 
likelihood decoding is the free distance. As we shall see in Section 3.5, this is also an 
important parameter for VLEC codes which determines the performance for the code. 

Definition 3. 7: The minimum block distance for length Z*, of a VLEC code C is 
defined as the minimum Hamming distance between all codewords with the same length 
L k , i.e. 

b k =min{#(c,, €,):<:,,<:, eC,/ *j and |c,|=|c,|= L k } (3.12) 

There are <x different minimum block distances, one for each different codeword 
length. However, if for some length Lk there is only one codeword, i.e. st = 1, then in this 
case the minimum block distance for length Lk is undefined. 

Definition 3.8: The overall minimum block distance, b mm9 of a VLEC code C is 
defined as the minimum value of b k over all k = 1 , 2, • • cr. 

Definition 3.9: The diverging distance between two codewords c, = c ix Ci 2 --c il( and 
c 7 = CjyCj 2 * "Cj tj , D(c h c y ), where c„ c, e C, /, = |c,| and lj = |c,|, with // > l J9 is defined as 

D(c h c y ) = //(c ;i c /2 ---c l// , cjfify—cjj. (3.13) 

Note that D(c„ c y ) = D(c j9 c,). 

The minimum diverging distance of the VLEC C, rf m i n , is the minimum value of all 
the diverging distances between all possible pairs of unequal length codewords of C, i.e. 

d m \ n = min {jD(c /5 c 7 ) : c /9 Cy e C, |Ci| * |c y |} (3.14) 



74 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

Definition 3.10: The converging distance between two codewords c, = c^c^-d, and 
c, = Cj fiy "C Jlf , C(c„ c y ), where c f , c, e C, /, = |c/| and /, = |c,|, with /, > is defined as 

C(c„ cj) = H(c hH/+l c ilHj+2 . • -c,,, c,,^- - .c y#/ ). (3.15) 
Again, note that C(c„ cj) = C(c,, c,). 

The minimum converging distance of the VLEC C, c m i n , is the minimum value of all 
the converging distances between all possible pairs of unequal length codewords of C, i.e. 

c mi n= min {C(c„ c y ) : c„ c 7 e C, |c,-| * |c y |} (3.16) 
A more complete description of a VLEC code may now be given by extending the 
notation introduced in Section 2.3. A VLEC code C (s\@Zu b\\ S2@L.2> b?, So@L<» b^ 
4nin, Cmin) is one which has Si codewords of length £, with minimum block distance for 
length L h b x for all / = 1, 2, <x, where a is the different number of codeword lengths, and 
a minimum diverging distance d m [ n and minimum converging distance c m in. An undefined 
minimum block distance for some length is denoted by a 4 -\ 

Definition 3.11: The free distance^ d^ of a VLEC code C is defined to be the 
minimum Hamming distance in the set of all arbitrary long paths that diverge from some 
common state 5, and converge again in another common state SjJ > i. More formally 

rffree = min {H(f h fj) : f„ fj e F N , N= 1, 2, oo} (3.17) 
where, Fn is the extended code of order NofC 4 . 

Theorem 3.3: The free distance of a VLEC code C is bounded by 

dfcc > min(6 ra i n9 d mm + c min ) (3.18) 
Proof. Following Definition 3.11 for the free distance, there are two cases to 
consider (see Figure 3.6). 

Case 1: If the two transitions emanating from the first state 5, are due to codewords 
of the same length, then these may give rise to two different paths from state S, to Sj arising 
from two parallel transitions from state Si to some state 5* and the same transitions from 
state Sk to the final state Sj. H'ence the free distance in this case will be equal to the overall 
minimum block distance b m [ n . 

4 Note that for certain values of N the set F N may be empty. 



75 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 




Case 2 

Figure 3.6: Proof of Theorem 3.3 

Case 2: If the transitions emanating from the first state S, are due to codewords of 
differing lengths, then the next states must be different. The distance between the two 
paths must therefore be at least equal to the diverging distance between the two codewords. 
However, at some point the two paths must re-converge to some common state. Therefore, 
the distance between the two paths is increased by the converging distance. Hence the 
minimum distance for Case 2 is at least d m m+c m \ n . 

The theorem follows by combining the two cases. ■ 

3.4.2. Constraint Length 

Another important parameter in convolutional codes is the constraint length [Lin & 
Costello, 1983]. There are several different definitions for the constraint length in the case 
of convolutional codes, but the one that seems most adaptable to VLEC codes is the one 
related to the number of stages in the shift registers for the convolutional code [Viterbi, 
1971]. If we call this number K, then the number of states in the state diagram for the 
convolutional code will be 2 K . 

Similarly, we may define a constraint length for VLEC codes based on the "memory" 
of the VLEC code. 

Definition 3.12: Let fj. be the minimum number of consecutive states in the repetitive 
part of the trellis diagram for a VLEC code C required to build up all subsequent states. 
Then, the constraint length, K 9 of C is defined to be 

K± lo& fi. (3.19) 
Using this definition, // is the number of states in any stage of the trellis for the 
VLEC code if this is constructed using the algorithm given in Section 3.2.2.1. 

76 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



The following lemma is useful to prove Theorem 3.4, which gives a relationship 
between the constraint length of a VLEC code C and the lengths of its codewords. 

Lemma 5.7: In the repetitive part of the trellis diagram of a VLEC code C, two states 
Si and Sj 9 j > z, are consecutive iff j = i + g, where g is the greatest common divisor (gcd) of 
all the different codeword lengths in C. 

Proof. Let N be the number of bits in a path. Then N is given by 

N=n x L x + n 2 L 2 + — +nJL<, (3.20) 

where, is the number of codewords in the path with length I„ i= 1, 2, cr. Let 

g = gcd(Zi, Z 2 , • • 7,0.), then there exist positive integers K t such that = V / = 1 , 2, - • 
<x Therefore, we can write equation (3.20) as 

N = 2>/*"/£ = • (3.21) 

i=\ 1 = 1 

Hence, 7/ must be a multiple of g'. Therefore, for sufficiently large N (when the trellis 
starts to repeat itself), the difference between two consecutive states is g bits. ■ 

Theorem 3.4: The constraint length, K, of a VLEC code C with codeword lengths L\ 9 
7,2, Z^withLi <L 2 < <L<» is given by 



* = k>g 2 



f L. 



(3.22) 



\gcd(Z 1 ,7 7 ,--.,i cr ) 

Proof. The state 5, representing bit position / is built up from the transitions arising 
from the states 3m,,, Si- Ll , 5,-/.^. Therefore, the most distant state is S,-^. Hence, at 
most, we need to store L a states to build the future states. But, by Lemma 3.1, the differ- 
ence between the bit positions represented by any two consecutive states is given by 
£ = gcd(7,i, 7,2, La). Hence, the minimum number of states required to build up all 
subsequent states in the repetitive part of the trellis, //, is L a Ig. But, by definition, 
AT=log2//. Hence the theorem is proved. ■ 

3.4.3. Catastrophic Codes 

As with convolutional codes, VLEC codes may exhibit catastrophic behaviour if not 
properly designed. A VLEC code is said to be catastrophic if a finite number of errors on 
the channel may cause an infinite number of decoded symbol errors. 

77 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



Example 3.3: Consider code C 5 (1@4,-; 1@5,-; 1@6,-; 2,2) given in Table 3.2. If the 
message ccc... is transmitted and there are three bit errors, in the first, fourth and fifth bit 
positions, then the decoder will never re-synchronise and the output will be baaa..., as 
shown in Figure 3.7. Hence, in this case, three bit errors will cause an infinite number of 
symbol errors and thus Cs is catastrophic. 



Source Symbol 


Codeword 


a 


0101 


b 


00110 


c 


101010 



Table 3.2: An example for a catastrophic VLEC code C 5 



Transmitted message: 
Encoded message: 
Received message: 
Decoded message: 



c c c 

loioi q[To 1 0 1 011 0 1 0 1 01 



o o 1 1 flio i olio i o nloToll fo 



Figure 3.7: Catastrophic behaviour of C$ 



One may notice that the above problem is similar to the problem of bounded 
synchronisation delay encountered in Chapter 2. In fact, it is obvious that if a VLEC code 
is catastrophic, then the code does not have a finite synchronisation delay as defined in 
Definition 2.1 on page 39. However, the converse is not true. Consider the VLEC code C 6 
(1@2,-; 1@3,-; 2, 2) given in Table 3.3. This code does not have a finite synchronisation 
delay since if the message acta... is transmitted and the first bit is deleted, then the decoder 
continues to misplace the codeword boundaries until it receives the codeword 111. 
However, this behaviour is not catastrophic, for while the decoder is out of 
synchronisation, it is still decoding correct symbols and another symbol error occurs only 
when the decoder goes back in synchronisation. Therefore, for a finite number of errors we 
have a finite number of symbol errors.. 



78 



Chapter 3 - Trellis Structure of Variable- Length Error-Correcting Codes 



Source Symbol 


Codeword 


a 
b 


00 
111 



Table 3.3: Simple VLEC code C 6 



In practice, however, catastrophic behaviour is not as serious as it may appear, since 
this catastrophic behaviour is present only for particular messages. Hence, even if a code is 
catastrophic, the probability that it will produce an infinite number of symbol errors from a 
finite number of channel errors is very small. However, catastrophic codes may have 
longer error propagations. 

3.5. Performance 

In general, the performance of error-correcting codes is difficult to determine 
analytically. For this reason two alternative approaches are usually followed. The 
performance of error-correcting codes may be bounded using the code characteristics (such 
as the minimum distance in the case of block codes and the free distance in the case of 
convolutional codes). Alternatively, computer simulations of the particular code on the 
given channel may be performed. Computer simulations tend to be inadequate for high 
signal-to-noise ratios, due to the large number of sample points required to obtain an 
accurate result. On the other hand, bounds tend to be tight at high signal-to-noise ratios but 
not so tight at the low end. Hence, both these methods are useful to obtain the performance 
characteristics of codes. In addition, bounds usually give us indications regarding which 
code parameters influence most the performance of the code. This will then enable us to 
design good codes. 

VLEC codes are no exception to these rules, as we shall see in this section. We shall 
first derive an upper bound on the symbol error probability of a VLEC code in terms of its 
free distance. 

3.5.1. Union Bounds 

A first error event is said to occur at an arbitrary bit position r corresponding to state 
S r , if the correct path is eliminated for the first time at bit position r in favour of a 

79 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



competitor (the incorrect path). Denote the first error event probability at bit position r as 
P/(£, r). The incorrect path must be some path that had previously diverged from the 
correct one at some bit position q y q <r, and is now converging back for the first time at bit 
position r. 

Let p qr be the correct path segment through the trellis with initial and final states S g 
and S r respectively, with || p' q || = 7j,. By definition, | p' \~r - q and \etN = r-q. Denote 
by f, e F N , the codeword of the extended code Fx of order N corresponding to 
p qr =(c /| ,c /j ,---,c /w ), i.e. f/ = c ll c/ 2 - c, w . Note that, as before, we can also say that 
p qr = (a it ,a, 2 ) where, c, is the codeword mapped to the source symbol a t . Further, 

let p^ r be a second path segment through the trellis, with \\p J qir \\ = ry and with the same 
initial and final states as p qr and let this be encoded as f Jm Note that fj e F N as well, since 
I Pqs I = I P J q j I = In general, however, 77, * 7%. 

Assuming a memory-less source, then the probability of occurrence of p q r , P(p qr ) 9 
is given by 

P(p q , ) = P(f y ) = i>( C|1 ) P(c, 2 ) • . • P( % ) = P{a h ) P(a h ) • • - P{a tj ). (3.23) 
Let h= H(f f , fj). We will also denote this distance by H(p' qr , p J qtr \ Then, the 
probability that the path p q r is decoded incorrectly as p J q r over the BSC with cross-over 
probability /?, Ph, is given by 



h-e 

h odd 

(3.24) 



t even. 



Lemma 3.2: The first error event probability at any bit position, P/JE) 9 is bounded by 

P f (E)< ±A h P h 

where, Ah is the average number of converging pairs of paths at Hamming distance h and 
^free is the free distance of the code. 

Proof. Pj(E 9 r) can be bounded, using a union bound, by the sum of the error 
probabilities of all paths which have the same initial and final states S q and S r respectively, 

80 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

V q = 0, 1, r-1 5 . Now, the joint probability that the correct path is p q >r and that this is 
erroneously decoded as p J qr is given by PhP{p q ^ Hence, 

P f {Es)<Y ^PiPir) (3.25) 

where, Pf, is given by equation (3.24) with h = H(p qr9 p q ,\ G q s is the set of all pairs of 
path indices (1,7) such that i *j and \(p q , r ) A \*\(pir) fij | V # = 1, 2, • • 7, - 1 and # = 1, 2, 
(-P^) denotes that part of path segment p q/ . containing the initial $ 
transitions, i.e. (p q>r ) fij = ( c <p c ' 2 ' ***> c *p)- 111 ot h er words, Gqj is the set of all pairs of path 
segment indices corresponding to path segments which diverge at state S q and merge again 
for the first time at state S r . 

Furthermore, in Section 3.2.2.1 we have seen that all states in the trellis are identical. 
Hence, we may further overbound (3.25) by considering all paths which merge (at any bit 
position), i.e. 

P f (E t r)<J^ Y P H P iM- (3-26) 

*=1 (iJ)eG 0Jt 

Note that in this case h = H( p' QJc , p{ k ). 

Since the RHS of (3.26) is now independent of r, then the first error event at any bit 
position, P/E) is bounded by 

k=\ 0J)eG OJt 

Let A h be the average number of converging pairs of paths at Hamming distance /z, 

i.e. 

A=E E^oO- ( 3 - 28 > 



Hid,. Pi,)-* 



Then, we can re-write (3.27) as 



P / (£)<£^i> (3.29) 



where <ifree is the free distance of the VLEC code C as given in Definition 3.11. 



5 In general, not all q's are possible, and hence for some values of q,P(p' ) = 0. 



81 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

The set Ah for all possible values of h is said to constitute the distance spectrum for 
the code. 

One may immediately recognise that the bound given by (3.29) is in the same form as 
that for convolutional codes [Lin & Costello, 1983]. 

Theorem 3.5: The error event probability, P(E) at any bit position is bounded by 

p{e) < Zap, ■ 

Proof. The final decoded path can diverge from, and converge with, the correct path 
any number of times. Figure 3.8 shows the three possible interactions between two 
incorrect paths p J r and pf and the correct path p\ . The incorrect path p J r contains the 
incorrect path segment p J q r , while the incorrect path pf contains the incorrect path 
segment p£ r , . When q > r' , as in case (a), the two error events are separate. In this case, 
the first error event occurs at bit position r\ This implies that at state S r >> the partial metric 
for the path segment p J q \ r , is greater than that for the correct path segment p q , r , . A second 
error event occurs at bit position r. Again, this implies that the partial metric for p J q , r 6 at 
state S r is greater than that for p j q \ r . This in turn implies that the partial metric for p j r is 
greater than that for the correct path at state S r . Hence, if p j r were compared to p[ at bit 
position r, a first error event would be made. Hence the error event probability at bit 
position r, P(E, r), is bounded by 

P(E 9 r)<P f (E,r). (3.30) 
The bound given by (3.30) holds also for cases (b) and (c) of Figure 3.8 and hence is 
a valid bound for any error event occurring at bit position r. In both these cases, the error 
event at bit position r replaces at least a portion of a previous error event. The net effect 
may be a decrease in the number of decoding errors (i.e. the number of positions in which 
the decoded path differs from the correct one). Hence, using the first error event 
probability as a bound may be conservative in these cases. Substituting for Pj(E, r) in 



Note that p^., contains also p?, r . 



82 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 




q r f 

Figure 3.8: The three possible error events interactions 



(3.30) and noting that this is independent of r, we have 

P i E )< ZA^. (3-31) 

m 

In a similar fashion we can obtain a union bound for the symbol error probability of VLEC 
codes, Ps(E). 

Theorem 3. 6: The symbol error probability of a VLEC code C is bounded by 

< T B » P » 

where, Bh is the average Levenshtein distance of all converging pairs of paths whose 
encoded messages are at a Hamming distance h from each other. 

Proof: Again, let p q ^ be the correct path segment. Let the source symbol sequence 
associated with p' q r be (a iv a iri a ifj ) and let a, • = a^a^—a^. Similarly, let p J qj . be the 
incorrect path segment and a,- = a jx a j2 ~ -aj^ where p^ = (a 7l , a JV • • a jfj ). Then, the number 
of symbol errors in the decoded message is given by Z,(a*, a,) (see Section 2.9.1). Hence, 
averaging the number of symbol errors over all possible pairs of converging paths, we have 

83 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



where, as before, h denotes the Hamming distance between the encoded paths p' 0 r and 
p( r ,i.e. h = H(ti^p J 0r ). 

Let Bh be the average Levenshtein distance between all converging path pairs whose 
encoded messages are at a Hamming distance h from each other, given by 

then, P M {E) < Y B » P » • ( 3 - 34 ) 

■ 

3.5.1.1. Evaluating the Distance Spectrum 

In Chapter 6 we shall prove that VLEC codes are always non-linear. In addition, 
within a set of paths of the same length k 9 not all path comparisons are allowed, since the 
pair of paths must have indices which are elements of Go,*. Hence, there are no distance 
invariant VLEC codes. Therefore, to evaluate the distance spectrum for any given code, 
we must consider all possible paths through the trellis and then take the average as given 
out in equations (3.28) or (3.33) as required. In the case of convolutional codes, which are 
linear, this task is much simplified, since we only need to take the all zero path as the 
correct path, and evaluate the weight spectrum of the other converging paths [Lin & 
Costello, 1983]. There are several fast algorithms which take advantage of this linearity 
property of convolutional codes to evaluate their distance spectrum. One such algorithm is 
the FAST algorithm given by Cedervall and Johannesson [1989]. Rouanne and Costello 
[1989] give an algorithm to evaluate the distance spectrum of regular and quasi-regular 
trellis codes, which are non-linear codes. However, to simplify the algorithm they define 
an equivalence relation which enables them to reduce the problem to a linear one. 
Unfortunately, this technique cannot be applied in the case of VLEC codes in general. 
Some slight simplifications may be achieved for some classes of VLEC codes, however 
these will be not be treated here. 



84 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



A brief outline of two algorithms developed to calculate the distance spectrum of 
VLEC codes is now presented. For a more detailed treatment, one should refer to 
[Buttigieg, 1994a] (see Appendix B). 

The first algorithm is optimised for speed, but requires a large memory space. In this 
algorithm, all possible paths in the trellis starting from state So are extended one after the 
other, the path p[ with r minimum being extended first each time. Each extended path is 
compared to all the other previously extended paths in the trellis and their intermediate 
Hamming distances are updated. When two paths converge to the same state, then the 
Levenshtein distance between the source symbols representing the paths is calculated, 
together with the probabilities of the two paths. These values are then used as two 
components 7 in the summations given by equations (3.28) and (3.33). Once two paths 
converge, then all extended paths from these two are no longer compared with each other. 
The algorithm is stopped when some pre-determined state is reached. 

In order to reduce the memory requirements of this algorithm and increase the speed 
of operation, we may limit the value of h for which we calculate Ah and Bh, since, as we 
shall see in Section 3.5.2, only the values of 2?/, for small h are significant in the bound 
given by expression (3.34). 

The main disadvantage with this algorithm is that we need to store almost all the 
different paths through the tree representation of the VLEC codes together with their 
probabilities and their pairwise intermediate Hamming distances. 

In the second algorithm, we eliminate the above disadvantage by extending only two 
paths at a time. The algorithm is outlined below. 

1 . Set r, the number of bits in the encoded path, to be equal to the shortest codeword in 
the VLEC code C, i.e. r = A. 

2. Find the first path in the trellis consisting of exactly r bits, i.e. p\ . Call this the 
reference path. If no such paths are found go to step 6. 



Since if path p[ converges to path pi , then path p* also converges to path p' r . 



85 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

3. Find the next path going to state S r , p' r , which converges with the reference path for 
the first time at state 5 r , if any. Call this the secondary path and proceed to step 4. 
Otherwise, if no further paths are found, go to step 5. 

4. Calculate the Hamming distance between the encoded reference and secondary paths 
and the Levenshtein distance between the source messages representing these paths. 
Calculate also the probability of the reference and the secondary paths. These values 
are then used in equations (3.28) and (3.33) to compute the various A h and Bh 
respectively. Repeat step 3. 

5. Find the next path after the reference path going to state S r , if any, and call this the 
new reference path. If a new reference path is found, go back to step 3. Otherwise, 
proceed to the next step. 

6. Increase r by gcd(Zi^ 25 -**^a)- Repeat from step 2 until r exceeds the required final 
state. 

The memory requirements of the above algorithm are very small. All we need to 
store at any one time are two paths. However, with respect to the first algorithm, we are 
repeating several calculations unnecessarily. Many pairs of paths have the same initial 
transitions and differ only in later ones. With this second algorithm, we need to calculate 
the Hamming distance between the common transitions each time. We also waste time to 
find all the paths going to state S r in a serial fashion. The same paths are repeatedly 
searched every time that we change the reference path and every time we increase r. In 
practice however, it is found that for a given limited memory space and time constraint, this 
second algorithm allows us to compute Ah and Bh for longer paths through the trellis than 
the first algorithm and hence can be more accurate. 

As in the case of the previous algorithm, we may increase the speed by limiting the 
value of h for which we evaluate Ah and Bh- 

3.5.2. Simulation 

The second method which is used to determine the performance of VLEC codes is 
that of computer simulation. In this case, average values for the symbol error probability at 



86 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

various values of SNR are calculated by encoding, corrupting and then decoding a number 
of messages of some given length and repeating this for a given number of times dependent 
on the power of the code being used, the message length used and the state of the channel. 
The technique used is that of Monte Carlo simulation [Jeruchim et al., 1992]. The 
following procedure is adopted. 

1. A source message, a, of length 77 source symbols is generated using the source 
symbol probabilities. 

2. The source message is encoded using the VLEC code C to give the encoded message 
x. 

3. The encoded message x is corrupted using the BSC to give the corrupted received 
message y. 

4. y is decoded using the modified Viterbi decoding algorithm given in Section 3.2.2.2 
(using either the maximum likelihood or the MAP metric), to give the decoded 
message b. 

5. The symbol error probability of b as compared to a is calculated using the method 
given in Section 2.9.2. 

The above algorithm is repeated for various values of p to give the performance 
curve for the code. 

3.5.3. Comparing Simulation Results and the Union Bound 

By comparing the results obtained through simulation to the union bound on the 
symbol error probability obtained in Section 3.5.1, we can determine the tightness of the 
bound, and also the validity of the simulation results. Two such comparisons are presented 
here. 

In all the results presented in this thesis, unless otherwise stated, it is assumed that 
the BSC is derived from the additive white Gaussian noise (AWGN) channel with 
coherently detected binary phase shift keying (BPSK) modulation. In this case, the 



87 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



crossover probability p is given by 



p = Q 



' \ 2E b R " 

i J 



(3-35) 



where E/, is the energy per bit, N 0 is the level of the single-sided power spectral density of 
white noise, R is the code rate given by f log 2 .y~|/ L meage and Q(x) is the complementary 
error function given by [Sklar, 198 8] 8 



(3.36) 



Figure 3.9 shows the performance curves for code C3 (2@5,3; 6@8,3; 3,2) given in 
Table 2.3 both with source A\ (also given in Table 2.3) and with a uniform source. The 
distance spectrum for this code calculated up to state S33 and h = 10 is given in Table 3.4. 
One may observe from this figure that the bound is tight for high SNR, as expected. 

As a second example, consider code C7 (1@3,-; 16@10,3; 3,0) given in Table 3.5 
together with its distance spectrum as given in Table 3.6 up to state 530 and h = 10. In this 



le+oo 



le-02 

1 
1 

PL, 

g 
w 

■3 le-06 



le-04 



00 



le-08 



1e-10 




Bound (Uniform Source) 

— — Simulation (Uniform Source) 

Bound (Source A i) 

— • Simulation (Source A 0 



2,4 6 8 10 

E b INo (dB) 
Figure 3.9: Symbol error probability curves for C 3 



12 



14 



8 Note that \x] denotes the smallest integer greater or equal to jc. 



88 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



case, if we compare the bound to the simulation results, we find that the bound is not very 
tight even at lOdB (see Figure 3.10). The reason for this is that in the bound given by 
(3.34) we are taking the union of all possible error events, even if some of them overlap. 
Hence, the symbol error probability is inflated. This becomes more apparent when the 
number of possible path through the trellis is increased, as in the case of C7, where the 
number of codewords is greater than that of C3. We can, however, tighten the union bound 
at high SNR as follows. 



h 


Uniform Source 


Source A\ 


A h 


B h 


Ah 


B h 


3 


2.50 


2.50 


1.70 


1.70 


4 


1.50 


1.50 


0.70 


0.70 


5 


0.07 


0.15 


0.09 


0.18 


6 


0.70 


1.47 


0.90 


1.84 


7 


1.62 


3.61 


1.94 


4.18 1 


8 


2.72 


6.95 


3.16 


7.55 


9 


5.19 


14.71 


5.76 


15.18 


10 


8.84 


26.92 


9.07 


25.93 



Table 3.4: Distance spectrum for code C3 up to state S33 



Source Symbols 


Codeword 


a 


000 


b 


1110000000 


c 


1110001111 


d 


1110010011 


e 


1110011100 


/ 


1110100101 


g 


1110101010 


h 


1110110110 


i 


1110111001 


j 


1111000110 


k 


1111001001 


I 


1111010101 


m 


1111011010 


n 


1111100011 


0 


1111101100 


P 


1111110000 


« 


1111111111 



Table 3.5: Code C 7 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



u 


Uniform Source 


A h 


B h 


3 


6.60 


6.60 


4 


6.60 


6.61 


c 

D 


0.07 


0.15 


6 


0.29 


0.62 


7 


1.43 


2.03 


8 


0.77 


1.89 


9 


1.48 


3.99 


10 


2.45 


6.98 



Table 3.6: Distance spectrum for code Ci up to state S30 



ie+00 



ie-02 



2 le-04 



1 

1 

a 
>> 

GO 



le-06 



le-08 



le-10 




•Bound 
-Simulation 



Approximation 



2 4 6 8 10 

E b IN 0 (db) 

Figure 3.10: Symbol error probability curves for C 7 



12 



14 



Corollary 3.1: The symbol error probability of a VLEC code at high SNR is 
approximately given by 

UlO-B^P^. (3.37) 
Proof. For high SNR, p is very small. Hence from equation (3.24) we can deduce 
that in this case, for even h 9 P\ » Ph+\ • However, for odd h, Ph = Ph+\> Therefore, for high 
SNR we can approximate the bound given by (3.34) to 



(3.38) 



90 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

Furthermore, since p is very small, then the bit errors are, with high probability, 
relatively far apart. Hence the error events are non-overlapping. Also, since Bh only 
includes pairwise comparisons, then a proportion of the error patterns of weight rff re e+l are 
also covered by others of weight tffeee. 9 Hence, as a first order approximation we can take 

Ps(E)«B^P df „ (3.39) 
for any <& e e and very small p. ■ 
The approximate expression for the symbol error probability given by (3.39) for code 
C7 is plotted in Figure 3.10. From this figure we can observe that this is indeed a good 
approximation to the true symbol error probability of the code for high SNR. It is 
important to note, however, that (3.39) is no longer an upper bound for the symbol error 
probability at low SNR. 

3.5.4. Comparing Maximum Likelihood and MAP Decoding 

Theorem 3.7: Maximum likelihood and MAP decoding for a VLEC code C are 
equivalent iff /,/log 2 P(c,) = -9 V c, e C, where 0is some positive constant, /, = |c,*| and 
P(c,) is the probability of occurrence of c,. 

Proof. It may be recalled from Section 3.2.2.2 that the branch metric in the Viterbi 
decoder for maximum likelihood decoding is m jML = -H(cj, yyn-vyn-ij-x) whereas, from 
Section 3.3, the branch metric for MAP decoding is m jMAP = [log/? - log(l -p)]H(Cj, 
y&t+v -ya-ij-i) + log P(cj). Since the branch metrics are applied on codewords of unequal 
length, then it is simpler to compare the two metrics for equal length paths of, say, N bits. 
In this case, the two respective metrics at state Sn become, Mv ML = -H(f t , y) and Mjv map = 
[log p - log(l - p)]H(f h y) + log P(fj) (from equation (3.6)), where f, e Fm 9 Fn being the 
extended code of C of order N. Hence, for maximum likelihood and MAP decoding of 
VLEC codes to be equivalent, i.e. for to be equal to Mv^, we require that P(fi) is 



9 For example, for the three merged paths (for some code) 00000, 01 1 10 and 1 1 101, the error pattern 01 100 
in included in B 3 since it gives the decoding error (00000)— >(0\ 1 10). However, it is also included in B 4 since 
it also gives the decoding error (00000)— >(1 1101). Obviously, in this case only one of these two outcomes is 
possible, hence the overbound. 



91 



Chapter 3 - Trellis Structure of Van able- Length Error-Correcting Codes 



constant for all f, e Fn- In other words, we require that all codeword sequences of the 
VLEC code C of total length N bits have the same probability. Now, 

N=n\l\ + n2h+ + n s l s (3 .40) 

where, w, is the number of times the codeword C/ appears in the Af-bit sequence. 

Also, for all codeword sequences of total length N bits to be equi-probable, we need 

that 

i > (c 1 .r x?(c 2 rx..-x?(cJ J = 2"* (3.41) 
where, 0is some positive constant (it needs to be positive since the LHS of equation (3.41) 
is always less than 1). Taking logs on both sides of (3.41) and rearranging 

-«,01og 2 P(c x ) -n 2 0\o% 2 P(c 2 ) n s 0log 2 P(c x ) = N . (3.42) 

Hence, from equations (3.40) and (3.42) we have 

nyh +^/ 2 +-+«,/^-«^log 2 ^(c^-rt^log, P(c 2 ) «,01og 2 P(c t ) (3.43) 

Comparing coefficients of n\ 9 n 2 , n s in (3.43), we have that (3.43) is true iff 

h 



log 2 P(c,) 



= -0, V/ = 1A— ,J (3.44) 



If the lengths and probabilities of the codewords of a code C do not satisfy (3.44), 
then MAP decoding will always outperform maximum likelihood decoding. However, we 
observe from the MAP metric given by expression (3.6) that as p— >0, \h m log p\ » 
|log P(f m )\ and hence in this case we may neglect log P(f m ) in the MAP metric. We can do 
this only if is odd 10 , otherwise, if this is even, there will be some received bit 
sequences which will be equidistant from two codeword sequences, in which case h m log p 
will be equal for both codeword sequences and hence in this case log P(f m ) will make a 
difference. Hence as /?~>0, and for odd rffree, maximum likelihood decoding is asymptotic 
to MAP decoding. How fast this occurs depends on the variability of the factor 
///log 2 P(Ci) and the values of Bh for even h. 



10 Only is considered here since p 0. 

92 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



Definition 3.13: The MAP factor of a VLEC code C is defined to be 




(3.45) 



where, a{ ) denotes the standard deviation, and ^average is the average codeword length with 
the given codeword probabilities. 

It is obvious that if ^map = 0, then we have the required condition for maximum 
likelihood and MAP decoding to be equivalent. It is conjectured that for any given VLEC 
code, the larger $map, the larger the gain with MAP decoding over maximum likelihood at 
any given p. This is illustrated here for a single code, but has been observed in many other 
instances. Code C% given in Table 3.7 is a (1@4,-; 4@5,3; 3@6,3; 2,1) VLEC code with 
free distance 3. With source A 2 its MAP factor is 0.006715, whereas with source A3 its 
MAP factor is 0.1373. Figure 3.11 shows a comparison between the performance of C» 
with the two sources, decoded with the MAP and the maximum likelihood metrics. Notice 
that for source A2 (for which Cg has a low MAP factor), maximum likelihood and MAP 
decoding are practically equivalent. However, for source A3 (for which Cg has a high MAP 
factor) MAP decoding achieves some coding gain over maximum likelihood decoding. 
However, since the free distance for C% is odd, then maximum likelihood decoding is 
asymptotic to MAP decoding for small p, as can be observed in Figure 3.11. 



Source Symbol 



Source A 2 



Source A3 



Codeword 



g 
h 



d 



b 



a 



e 



c 



0.20 
0.13 
0.13 
0.13 
0.13 
0.10 
0.09 
0.09 



0.35 
0.30 
0.10 
0.10 
0.05 
0.05 
0.03 
0.02 



0111 

00011 

11101 

01000 

10110 

001011 

100010 

110100 



Table 3.7:> Different source probabilities for code Cs 



93 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 




le-05 1 1 

0 2 4 6 8 

E b IN 0 (dB) 

Figure 3.11: Comparisons between MAP and maximum likelihood decoding for Cg 

3.5.5. Comparing Maximum Likelihood and Instantaneous Decoding 

In Chapter 2 we have given three different instantaneous decoding algorithms for 
VLEC codes. The prefix decoding and the segment decoding algorithms presented in 
Sections 2.6.1 and 2.6.3 respectively are appropriate for specific error-mappings. Whereas 
the tail decoding and the maximum likelihood decoding algorithms, presented in Sections 
2.8 and 3.2.2.2 respectively, are more general, even though the former must still be used 
with instantaneous (or a-prompt) VLEC codes to be effective. In this section we will 
compare the performance of these decoding algorithms over the BSC. 

Table A.l gives an a\ -prompt code for the 26-symbol English source with a code rate 
of 0.65. It is also a (13@7,3; 4@10,3; 2@12,4; 2@13,4; 2@14,4; 3@17,3; 3,0) VLEC 
code with free distance 3. Figure 3.12 shows a performance comparison between the 
various decoding algorithms for VLEC correcting codes. This code was designed to 
instantaneously correct a single error per codeword. Notice how prefix decoding 
outperforms segment decoding. For instance, at a symbol error probability of 10" 3 , prefix 
decoding has a coding gain of about 1 .5dB. The reason for this is that this code cannot 



94 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

correct errors in all segments. Also note that prefix decoding and tail decoding (with the 
original metric derived by Massey 11 ) practically have the same performance. However, 
maximum likelihood decoding achieves a further 1.3dB gain over prefix decoding at a 
symbol error probability of 10" 4 . This is achieved at the expense of increased decoding 
complexity. The main reason why maximum likelihood decoding outperforms the other 
algorithms is that it is less susceptible to loss of synchronisation. This is not so clear from 
Figure 3.12, however if we were to observe the error distribution for prefix decoding, say, 
we will see that there are long periods during which there are no decoding errors. 
However, when an error pattern outside the error mapping occurs, the decoder decodes a 
different length codeword, resulting in synchronisation loss and a burst of decoding errors. 
This effect is especially evident for high SNR. Here, more simulation points are required 
than is normally expected, since at high SNR the probability of having an error pattern 
which is not included in the admissibility mapping will become very small. 




le-07 1 1 

0 2 4 6 8 10 

E b /N a (dB) 

Figure 3.12: Performance comparisons for the ai -prompt code given in Table A.l 



1 1 In this case, the bits in the tail are considered to be random bits with probability distribution Q derived 
from the code and its probability distribution. 



95 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



Figure 3.13 compares the performance for the various decoding algorithms for the 
«i,i-prompt code also given in Table A.l. This is a rate 0.59 (6@7,3; 20@10,3; 3,0) 
VLEC code with free distance 3. Note that this time, prefix and segment decoding have 
the same performance, whereas the tail decoding algorithm has a marginal coding gain over 
these two algorithms, with the metric derived in Section 2.8 achieving slight gain over the 
original metric derived by Massey. The real coding gain is achieved by the maximum 
likelihood decoding algorithm.. For instance at a symbol error probability of 10" 3 the 
coding gain is about 1.5dB, over the instantaneous algorithms. 




le-05 1 

0 2 4 6 8 



EtlNo (dB) 

Figure 3.13: Performance comparisons for the a\ % \ -prompt code given in Table A.l 

From other results obtained, it was empirically found that prefix decoding always 
performs better (or the same) as segment decoding. They only have the same performance 
for codes constructed to correct errors in all segments of the code. Thus we conjecture that 
prefix decoding is always better than (or equal to) segment decoding. Heuristically, we can 
explain this by observing that in prefix decoding, the decoding decisions are being based 



96 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

over a larger number of bits than in segment decoding, which only consider a segment at a 
time. 

Also as expected, the tail decoding algorithm produces better results than both the 
other instantaneous decoding algorithms. This is also backed by theory, since the tail 
decoding algorithm is using the optimum metric for instantaneous decoding of VLEC 
codes. The coding gain achievable by dropping Massey's assumption, that the bits in the 
tail are chosen independently, is. very slight and in practice not worth implementing, since it 
complicates the metric. However, as expected, the best performance is achieved by the 
maximum likelihood decoding algorithm. 

One further class of VLEC codes that was introduced in Chapter 2 is that of two- 
length codes introduced by Dunscombe [1988] (see Section 2.7). Here, the difference 
between the four different decoding algorithms is minimal for most codes considered. For 
instance, for a two-length code for the 26-symbol English source constructed from the (7,4) 
Hamming base code by taking 15 codewords from the base code as short codewords and 
using the 16th codeword of the base code as a prefix to 10 other codewords from the base 
code to form 16 long codewords of length 14, the performance is practically identical for 
all four algorithms. The main reason for this is that, as conjectured by Dunscombe, these 
codes have good synchronisation properties with relatively short error spans. Hence the 
instantaneous decoding algorithms, in this case, perform almost as well as maximum 
likelihood. The two main problems with two-length codes are that the code rate cannot be 
increased much for a given error-correcting capability, since the choice of the base code is 
limited. The second problem is that these codes are still very similar to fixed-length codes. 
As we shall see in Chapter 5, these codes are not self-synchronising when the channel 
admits bit deletions and insertions. 

3.6. Decoding Window Depth 

In the modified Viterbi algorithm as given in Section 3.2.2.2, the decoder has to wait 
till the end of the message to start decoding. For large N this will cause a very long 
decoding delay, which in most cases would be unacceptable. However, as in the case of 



97 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



convolutional codes, we can limit the decoding delay by forcing the decoder to decode a 
symbol after some given number of bits have been received. Let W denote this number of 
bits, called the decoding window depth. The decoding algorithm is modified as follows. 
When W bits are received (assuming that W is less than AO, the decoder chooses that state 
from the last 2 K states in the trellis with the maximum metric, where K is the constraint 
length of the code. It then retraces the surviving path corresponding to this state and 
decodes the first symbol in this path. All the surviving paths in the trellis which do not 
have as their first codeword the decoded codeword, are deleted from the trellis. This 
ensures that the same number of bits as that transmitted will be decoded. After the first 
symbol is decoded, another L bits are required before decoding the next one, where L is 
the number of bits in the decoded codeword. Hence, a decision is always based on the 
previous Whits held in the decoding window. In this case, the maximum decoding delay 12 
is the time corresponding to Habits. 

Ideally, we should choose W such that all the surviving paths in the trellis have the 
same initial transition. This will ensure that whichever one of the surviving paths is 
chosen, will not affect the value of the decoded symbol. In practice, we can never achieve 
this goal, however by choosing W large enough we can ensure that this condition is true 
most of the time. 

Figure 3.14 shows the effect of varying the decoding window depth for the VLEC 
code C l5 (l@6,s 2@7,5; 3@8,5; 4@9,5; 5@10,5; 4@11,5; 4@12,5; 3@13,5; 3,2) with 
free distance 5 for the 26-symbol English source given in Table A.2. The constraint length 
for this code is 3.7. As a rule of thumb it has been found that if W\s taken to be about five 
times the maximum codeword length in the code, then the performance will be practically 
the same as for W = N. However this factor decreases somewhat when the gcd of the 
codeword lengths is not equal to one. For code C l5 , the maximum codeword length is 13 
bits. Hence a decoding window depth of about 65 bits should be adequate. This is clearly 
confirmed by the performance curves given in Figure 3.14. 

12 Delay is measured from the time the codeword is received until it is decoded. 



98 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 




1e-06 



0 2 4 6 8 

E b IN 0 (dB) 

Figure 3.14: Effect of decoding window depth on the performance of VLEC code C\s 

3.7. Complexity 

The question of complexity is always a big issue in any decoding algorithm. To 
complicate things, not everybody agrees on what should be included and what should be 
left out in determining the complexity of any particular algorithm. Complexity is important 
because it determines 

• the maximum operating speed of the decoder for a given hardware 

• the memory requirements 

• the size of the control logic required. 

Of the three aspects mentioned above the third one is in general the most difficult to 
quantify and is greatly influenced by the actual implementation of the algorithm. 

Common criteria used to judge the complexity of a coding system based on Viterbi 
decoding are the numbers -of comparisons and additions necessary to decode the 
transmitted message [McEliece, 1994]. In the case of soft decision decoding, what is 
important is the number of real operations. Here, we will be dealing with hard decision 



99 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



decoding, where the operations are all modulo-2. However the results apply equally well 
for the soft decision case. 

In the repetitive part of the trellis for a VLEC code, there are s transitions going to 
each state. Hence, in the modified Viterbi decoding algorithm as given in Section 3.2.2.2, 
we need s - 1 comparisons to determine the one with the maximum metric. Depending on 
which metric is being used, the comparisons may be either between integers or floating 
point numbers. However, even in the case of the MAP metric, we can transform the metric 
to integer values with little degradation in the performance. This is similar to what is done 
in soft decision Viterbi decoding of convolutional codes where the metrics are also 
transformed to integers from floating point numbers [Clark & Cain, 1981]. 

For each transition going into a state, the metric for that transition must be added to 
the metric of the surviving path from the previous state. This is equivalent to / additions, 
where / is the number of bits representing the transition. Hence, in the repetitive part of the 
trellis, we need to perform ^ /,■ additions at each state. 

The total number; of states in the trellis depends on the message length. If the total 
number of bits in the encoded message is N and g is the gcd of the codeword lengths of the 
code, then the total number of states, N s , for large N 9 is approximately given by N I g. The 
approximation arises from the fact that during the initial and final parts of the trellis, some 
states do not exist. Hence, if the average codeword length for the code C when used to 
encode some source A is leverage, then for large N, 



where, rj m is the number of source symbols in the message. 

Hence, using expression (3.46) we obtain the following relationships for the number 
of comparisons and additions required to perform maximum likelihood decoding of VLEC 
codes. 




(3.46) 



Number of comparisons per source symbol » 



'average 



(3.47) 



8 



100 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 

Number of additions per source symbol % ^ — (3.48) 

g 

The memory requirements for the modified Viterbi decoding algorithm depends to a 
large extent on the depth of the decoding window, W. In practice, all we need to store in 
this case is Wig source symbols. In addition, we need to store the metric of the surviving 
paths. To this end, we need to store 2 K metric values, where K is the constraint length of 
the VLEC code. 

3.8. Conclusion 

In this chapter we have given a maximum likelihood decoding algorithm for VLEC 
codes which is not, however, instantaneous. We have also derived a MAP metric for 
VLEC codes. However, in practice, the coding gain achievable by using the MAP metric 
instead of the simpler Hamming distance metric for most codes and sources is negligible, 
especially at high SNR. Hence, MAP decoding is not worth implementing due to the 
increased complexity in computing the path metrics. Note also that MAP decoding of 
VLEC codes requires the decoder to know the state of the channel 13 , something that is not 
always possible. The increased coding gain achievable by maximum likelihood decoding 
over the instantaneous decoding algorithms presented in Chapter 2 is, however, significant 
and this warrants the use of maximum likelihood decoding for VLEC codes over noisy 
channels. The question of how the performance of VLEC codes with maximum likelihood 
decoding compares with standard concatenated schemes employing separate source and 
channel coding will be dealt with in Chapter 6. 

We have also seen that the modified Massey metric together with the tail decoding 
algorithm given in the previous chapter is really optimal only in the instantaneous case. In 
many cases, the coding gain achievable with this algorithm over the prefix decoding 
algorithm is minimal. The metric used is more complex, though. Hence, its practical use 
is minimal, especially in view of the larger gain achievable with maximum likelihood. 



This is so because p, the cross-over probability, is part of the MAP metric. 



101 



Chapter 3 - Trellis Structure of Variable-Length Error-Correcting Codes 



An interesting aspect which comes out from the derivation of the union bound on the 
symbol error probability of VLEC codes given in Section 3.5.1 is that it agrees perfectly 
with the way in which the symbol error probability was calculated in Section 2.9.2, where 
the Levenshtein distance between the transmitted message and the decoded one is 
approximated by effectively segmenting the received message into separate error events 
and evaluating the Levenshtein distance separately on these segments. This also seems to 
us the most natural way in which to calculate the symbol error probability. However, as 
already commented on in Chapter 2, the difference between this and the original definition 
for symbol error probability is negligible in practice. 

The definition for the free distance of a VLEC code gives us also another way in 
which to say that a code is uniquely decodable. A code with free distance of one or more 
implies that all paths in the trellis are distinguishable and hence the code is uniquely 
decodable. Obviously, for the code also to be able to correct errors, we require that the free 
distance be at least equal to three. 

The constraint length of a VLEC code, as in the case of convolutional codes, is 
directly related to the number of states required in the decoder for optimal decoding. It also 
indirectly influences the free distance of the code, since it is directly related to the 
codeword lengths. In convolutional codes, the constraint length also has a direct bearing 
on the required decoding window depth in the Viterbi decoder. A similar relation also 
seems to exist for VLEC codes, however more experimental data is required to get a more 
exact picture. We have already commented that as a rule of thumb, the decoding window 
depth for VLEC codes should be about five times the maximum codeword length. 
However, when the gcd of the codeword lengths is not equal to one, this factor is less. In 
the extreme case, when we have two-length codes and the constraint length is one, a 
decoding window depth equal to the maximum codeword length (instantaneous decoding) 
is sufficient for near maximum likelihood decoding. 



102 



Chapter 4. 

Sequential Decoding 



4.1. Introduction 

The maximum likelihood decoding algorithm for VLEC codes given in Chapter 3 
searches through all the possible paths in the code tree of a given length and chooses the 
one at the minimum Hamming distance to the received sequence. It does this in the most 
efficient way possible by using the modified Viterbi algorithm. This ensures that paths in 
the tree which cannot form part of the maximum likelihood path are discarded as soon as 
possible, and hence this limits the search space in the code tree. 

Another way to reduce the search space in the code tree is to do a sequential search 
by extending in turn those paths of the code tree which are most likely. If the distance 
between any two paths in the code tree increases with increasing path length (by effectively 
using a non-catastrophic VLEC code), then, if a wrong path is chosen at some point, its 
distance to the correct path will continue to grow with the path length. Eventually, the 
decoder notices that this is no longer a good path and backtracks to some previous path. 
Hence, if there are no errors on the channel, the correct path is extended immediately and 
the decoding process is much faster than with the modified Viterbi algorithm. However, if 
the number of errors is large, the decoder may search a large proportion of the code tree 
before finding the correct path (if this is possible), thus becoming more complex than the 
modified Viterbi algorithm. 

In this chapter we shall derive an optimum metric which determines the most likely 
path to follow in sequential decoding of VLEC codes. The stack algorithm is adapted to 



103 



Chapter 4 - Sequential Decoding 



perform sequential decoding of VLEC codes [Buttigieg & Farrell, 1994c]. We shall also 
characterise the necessary requirements for near optimal decoding. The computational 
effort required for sequential decoding will be compared with that required for maximum 
likelihood decoding and a condition for this to be less, given. Due to the similarity 
between VLEC and convolutional codes, this work mirrors similar work done for 
convolutional codes. However, some important differences will be highlighted. 

4.2. Metric for Sequential Decoding 

In Section 3.2 we have found that for maximum likelihood decoding we must decode 
that path in the code tree which is at the minimum Hamming distance to the received bit 
sequence. Hence, it is reasonable to assume that for sequential decoding the required 
metric must be a function of this distance as well. However, unlike in the case of the 
modified Viterbi algorithm, here we will be comparing paths with unequal numbers of bits. 
As a result, even a very good, but long, path may be at a greater Hamming distance to the 
received sequence than a much shorter, very bad path. Hence, we must also take into 
account the length of the paths being considered, biasing our decisions in such a way that 
longer paths are preferred, since these would be much more likely. Therefore, the required 
metric must also be a function of the path length. 

In Section 2.8 we dealt with Massey's optimum metric for variable-length codes 
[Massey, 1972] which, with further manipulations, reduces to the Fano metric, heuristically 
first introduced by Fano [1963]. As is well known, this is the preferred metric in sequential 
decoding of convolutional codes [Lin & Costello, 1983]. Hence, it is reasonable to expect 
that this metric, slightly modified, may also be useful for sequential decoding of VLEC 
codes. 

Consider a VLEC code C (si@L u b\\ s 2 @L 2 , b 2 ; — ; sj@Lo> b^ d m{n , c min ). 
Expanding the context of equation (2.20) given on page 55 to include a sequence of 
codewords, and assuming that the message sent has Ambits, we have 

Pr(« m ,y) = P m f[P{y^ mi )Y[ P o^N m ,j) (4.D 
'=1 >=i 



104 



Chapter 4 - Sequential Decoding 



where Po(yi) is given by equation (2.21) on page 55, u OT is a path through the tree consisting 
of a sequence of rj m codewords of length N m bits given by u m =c m ,c m2 - "C mrf j=u mi u m2 - -u m ^ 
with c mi , c OT2 , c m ^ e C, y =y\y2~-yN is the received JV-bit sequence and P m = 
P(c mi )/ > (c m2 )- • -P(c m J is the probability of path u m . 

Then, following Massey [1972], dividing (4.1) by ^(x) > which is constant for 
all messages u OT , and taking logs, we have 



log V ' m ' J +—logP m 



(4.2) 



Hence, given a list of unequal length paths, the most probable path is the one which 
maximises equation (4.2). Therefore, F(u m , y) is the required metric for sequential 
decoding of VLEC codes and is essentially the same as the Fano metric. 

For binary VLEC codes over the BSC with cross over probability p, F(u m , y) may be 
written as 

\H(c mi ,y mt )logp + [l mi -H(c miy y mi )]\og(l-p)^logP(c mi ) 
F(" m ,Y) = jL< -W(c mf )log[pQ(0) + (1-/7)2(1)] ► (4.3) 

' =l 'W(c mi )]\og[(l^p)Q(0) + pQ<il)] 

where ym } ym2-ym n y f = y> is the decomposition of the received sequence y with respect to 
the codeword sequence c mi c m2 —c m ^ with |y m J = |c m J = l m . for all /= 1, 2, r\ m , W(c m ) is 
the Hamming weight of codeword c mi and g(0), Q(l) are the probabilities of sending a 0 
and a 1 respectively. For an efficient code, Q(0) ~ £?(1). If we take this approximation, the 
metric given by equation (4.3) is simplified to 

^(u m ,y) = ,y OT , ) logp +[/ m , - H(c mi ,y m> )]log(l - p) + log P(c mi ) - l mi logj} (4.4) 



4.3. Stack Algorithm 

The modified Fano metric given by equation (4.3) (or the simplified version given by 
(4.4)) is the log probability of a path u m in the code tree of a VLEC code, given the 
received sequence y. Hence, given a set of paths through the code tree, we can determine 
the most likely path transmitted, by choosing that one with the maximum metric. Having 



105 



Chapter 4 - Sequential Decoding 



decided which of the current paths is the most likely, we then extend this path by 
considering the next branches (codewords) on this path. This will give a new set of paths 
searched in the code tree and the process is repeated until a path of length TV bits is found. 
This is the idea behind the stack algorithm used for the sequential decoding of 
convolutional codes introduced independently by Jelinek [1969] and Zigangirov [1966]. In 
this algorithm the searched paths and their metrics are stored sorted in a stack, hence the 
name. Here, this is simply extended for a VLEC code C (s\@L\, b\\ S2@L 2 , br, Sa@L&> 

bo* ^mim ^min)* 

1. Put the path representing bit position 0 (the root node of the code tree) in the stack 
and assign a metric 0 to this path. 

2. Extend the path at the top of the stack thus generating s new paths and compute the 
metrics of these new paths using equation (4.3) (or (4.4), as the case may be). This is 
simply the metric of the path at the top of the stack added to the metric for each 
respective codeword. 

3. From the paths generated in step 2 retain only the maximum metric paths for each 
different codeword length. Hence, the number of extended paths is reduced from s to 
<x 

4. Delete the top path from the stack. 

5. Insert the paths retained in step 3 in the stack in such a way that the stack contains 
paths with decreasing metric values. 

6. If the top path in the stack has the same number of bits as that transmitted, then stop 
and output the information symbols corresponding to this path. Otherwise, repeat 
from step 2. 

Since the decoded path may not be the correct path, then we may have the situation 
that the decoded path will not have the same number of bits as that transmitted. Hence, in 
this case care should be exercised at step 6 in the above algorithm, because it may be that 
the path at the top of the stack will never achieve this condition. This may be solved in two 
ways. If the decoded path is allowed to have an unequal number of bits, then the algorithm 
is stopped when the top path has the same number of bits as that transmitted or more. 

106 



Chapter 4 - Sequential Decoding 



However, a neater solution is the following. If we observe the trellis structure of a VLEC 
code, we note that during the initial and final parts of the trellis there are "forbidden" 
states 1 . For example, for code C4 given in Table 3.1, from the initial part of the trellis 
shown in Figure 3.3, the initial forbidden states are S\ 9 S2, and S5. Similarly, if the total 
number of bits transmitted is N 9 then the states Sam, Sn-2 and SV-5 will also be forbidden, 
since any path to these states cannot lead to a final path of length N bits. Hence, if all the 
extended paths to the forbidden states are immediately discarded as soon as they are 
generated, then the above algorithm will always ensure that a final path of length N bits, 
equal to that transmitted, will eventually reach the top of the stack. This has the added 
advantage that the last codewords transmitted are offered the same protection level as the 
others. 

Another important thing to note is that following this algorithm, the size of the stack 
grows at each iteration by cr— 1 entries. However, as we shall see, in practice it is found 
that the size of the stack can be limited to a relatively small value without appreciably 
degrading the performance of the decoder. In this case, those paths with the smaller metric 
values are simply deleted from the stack. 

The following example will clarify some of the points raised above. 

Example 4.1: Consider code C3 given in Table 2.3 and consider that the message 
fcbfcdaa... is transmitted and that the sequence 

01000010.11001111. 10110.01101010.11001111.01111111.00000.00000...2 

is received over the BSC channel with crossover probability p = 0.01. For code C 3 used to 
encode source A\ (also given in Table 2.3), the probabilities of transmitting a 0 or a 1 are 
respectively 0(0) = 0.5256 and Q{\) = 0.4744. Hence, we can express the metric given by 
equation (4.3) as 

n* m ,7) = £{-6.629/f(c mi ,y mi )+log 2 /»(c,) + 0.915/, + 0.145^(c, )} . (4.5) 

/=1 



1 Here we are not including those states that are always skipped throughout the trellis arising from the fact 
that the gcd of the lengths of the code would not be one. 

2 As usual, the bits received in error are shown in bold and the codeword boundaries are denoted by \\ 

107 



Chapter 4 - Sequential Decoding 



The related stack evolution is shown in Figure 4.1, where paths present in the stack 
expressed as a symbol sequence are shown with the corresponding metric values. Notice 
that up to Step 6, the incorrect path agbcgc is being followed. This will in fact form the 
initial part of the decoded message if the decoder is an instantaneous one as given in 
Section 2.8 using the Massey metric in its original form as given in equation (2.20). 

Step 1 Step 2 Step 3 Step 4 Step 5 

a -2.37 ag -0.41 agb -2.63 agbc -3.95 agbcg-\.99 
f -6.91 /-6.91 /-6.91 /-6.91 / -6.91 

aa -13.74 agd -10.82 agd -10.82 agd -10.82 
aa -13.74 aa -13.74 aa -13.74 
agba -14.00 agba -14.00 
agbca -15.32 

Step 6 Step 7 Step 8 Step 9 Step 10 

agbcgc -3.11 /-6.91 fc -3.54 fob -\. 21 fcbf \.2\ 

f -6.91 agbcgcb -10.03 agbcgcb -10.03 agbcgcb -10.03 agbcgcb -10.03 
ago" -10.82 ago" -10.82 age/ -10.82 agd -10.82 age/ -10.82 
agbcgb -13.40 agbcgcb -10.84 agbcgch -10.84 agbcgch -10.84 agbcgch -10.84 
aa -13.74 agfcgfr -13.40 agZ>cg& -13.40 ag6cg6 -13.40 Jfcfca -12.63 
agfta -14.00 aa -13.74 aa -13.74 aa -13.74 ag6eg£> -13.40 

ag&ca -15.32 agba -14.00 agfta -14.00 agba -14.00 aa -13.74 

agfca -15.32 agbca -15.32 ag&ca -15.32 agba -14.00 
/a -18.28 /a -18.28 agbca -15.32 

/co" -18.65 fa -18.28 

/ca* -18.65 



Step 77 
febfe 4.58 
agbcgcb -10.03 
./ci/a -10.15 
ago" -10.82 
agbcgch -10.84 
/c6a -12.63 
agbcgb -13.40 
aa -13.74 
agfea -14.00 
ag&ca -15.32 
fa -18.28 
/ca" -18.65 



Step 12 
febfed 8.06 
ye6/e& -6.84 
agbcgcb -10.03 
/c6/a -10.15 
ago" -10.82 
agbcgch -10.84 
Jfc&a -12.63 
agbcgb -13.40 
aa -13.74 
agba -14.00 
agfora -15.32 
/a -18.28 
fed -18.65 



Step 75 
febfeda 10.18 
Jfctyci -6.84 
agbcgcb -10.03 
febfa -10.15 
ago* -10.82 
agbcgch -10.84 
febfede -11.82 
/c£a -12.63 
agbcgb -13.40 
aa -13.74 
agba -14.00 
ag&ea -15.32 
-18.28 
fed -18.65 



Step 74 
febfedaa 12.29 
/ctyco -6.84 
agbcgcb -10.03 
febfa -10.15 
ago* -10.82 
agbcgch -10.84 
febfede -11.82 
/c£a -12.63 
agbcgb -13.40 
aa -13.74 
agba -14.00 
agbca -15.32 
^a -18.28 
fed -18.65 
fcbfcda?7>7> 



Figure 4.1 : Evolution of stack contents. in example 4.1 



108 



Chapter 4 - Sequential Decoding 



Clearly in this case, the symbol error probability will be large, because the first codeword is 
decoded to a different length codeword causing loss of synchronisation. However, this loss 
of synchronisation is indirectly detected by the stack algorithm in Step 7, when the metric 
of the incorrect path agbcgcb falls below that of the correct path / 

Notice also from Figure 4. 1 that when there are no errors on the channel, the metric 
of the correct path increases, whereas the metric of the incorrect paths generally decreases 
(but see Section 4.5). Note that in Step 14 we require more bits to determine the path 
fcbfcda?. 

4.4. Performance 

The algorithm presented in the previous section, although it maximises the 
probability of choosing the most likely path to extend at each state, does not necessarily 
choose the most likely path to the final state in the trellis, as is the case of the modified 
Viterbi decoder given in Chapter 3. This notwithstanding any physical constraints imposed 
on the algorithm so as to ensure limited computation time and buffer space. Simulation 
results have shown, however, that if the VLEC code is properly designed, then the two 
algorithms will practically have the same performance. 

For the case of maximum likelihood decoding, we have seen in Section 3.5.1 that the 
performance of a VLEC code is greatly influenced by the value of the free distance of the 
code. This is also a well-known result for the case of convolutional codes [Viterbi, 1971]. 
Chevillat and Costello [1978] have also shown that this is also true for sequential decoding 
of convolutional codes provided that the distance growth of the code exceeds some lower 
limit. For this reason they define a column distance function for convolutional codes, 
which measures the minimum Hamming distance between merged and unmerged 
codewords of the convolutional code of a given length. We shall now define a similar 
function for VLEC codes. 

4.4.1. Column Distance Function 

Definition 4 J: The column distance function (CDF), d c (?j), of a VLEC code is 
defined as the minimum Hamming distance between any two paths in the code tree with 77 

109 



Chapter 4 - Sequential Decoding 



symbols (or transitions) in the shorter (in terms of bits) path, with the condition that the 
first codewords in each path being compared are of unequal length, i.e. 
^ 7 ) = min{/f(j7^,<): p^ 9 p&, e (T, q <r, || || = 7 and \(p^\\ *\(pi r \\ } (4.6) 

where, as before, p' 0 represents a path, or a sequence of codewords, through the tree or 
trellis starting at state So and ending at state S q , with || p l 0 ^ || codewords. Since q < r, 
H(p' 0q , pl r ) is the Hamming distance between two possibly unequal length sequences. 
This is defined to be the Hamming distance between those parts of the sequences of the 
same length; i.e., H(p^ q , pi yT ) is really the diverging distance between the two paths (if the 
two paths are considered to be codewords). 

This definition for the CDF is similar to the one for convolutional codes [Chevillat & 
Costello, 1978], however due to the variable-length nature of the codes, the codewords 
being compared may not necessarily be of the same length. 

Theorem 4.1: The CDF of a VLEC code is a monotonically increasing function, i.e. 

d c (rj)<d c (r^l) (4.7) 

Proof. Remove the last codeword from that path which has 77+ 1 codewords and is at 
Hamming distance d c (rfr\) to some other path in the tree. Now, at best, the minimum 
Hamming distance of this /7-symbol path to all the other (longer) paths in the tree, d^ 
remains unchanged or decreased by some value, i.e. 

dr,<d c (TT+\). 

But, by definition 

<t(7)<</„ 

d c {rj)<d c (rr±\) (4.7) 

■ 

The implication of the CDF is that any path which has 7 codewords, is at least d c (rj) 
distance away from any other path in the tree, of the same or longer length. 

From Theorem 3.3 (page 75), we have seen that the free distance of a VLEC code C 
(s\@L u b\\ s 2 @L2, b 2 ; sJ^L^ d mm , c min ) is bounded by 

dfrec > min(Z? m in, d m i n + C m in). (3.18) 



110 



Chapter 4 - Sequential Decoding 



Now, in the definition for the CDF we have removed from the possible path 
comparisons, paths whose first codeword are of equal length. This effectively removes 
Case 1 in Theorem 3.3. 

Definition 4.2: The unequal length free distance, d u , of a VLEC code is defined to be 
the minimum Hamming distance in the set of all arbitrarily long paths that diverge from 
some common state 5, due to unequal length codewords and converge again in another 
common state Sj 9 j > L 

Theorem 4.2: The unequal length free distance of a VLEC code C (s\@Lu b\\ s 2 @L 2i 
b 2 \ • • • ; So@Lo> bo; </ min , c min ) is bounded by 

d u > d mm + c m j n . (4.8) 

Proof. Since parallel transitions are not considered for the unequal length free 
distance, then the only case to consider is Case 2 of Theorem 3.3. ■ 

Corollary 4.1: The unequal length free distance of a VLEC code is always greater 
than or equal to the free distance of the code, i.e. 

<4><4ee (4.9) 
Proof: From Theorem 3.3 it is clear that if b m m < d m \ n + c m i n , then <i fre e = #min. Hence 
in this case, since d u > d min + c min , then d u > d^- Next, if b m \a > d mln + c min then d^ c > 
dmin + ^min- In other words, the free distance is not influenced by the parallel transitions and 
hence by Definition 4.2 in this case d u = d^ Q . ■ 
Note that for VLEC codes with <r= s, i.e. all codewords are of different lengths, then 

d u = dfrcc 

Theorem 4.3: The CDF of a VLEC code is bounded by 

d c {T])<d u (4.10) 

for all 77. 

Proof Consider two paths which are at the unequal length free distance d u ? Then, 
d c (7]') ^ djj, where rj' is the number of codewords in that path with the least number of 
codewords, from these two. Now, adding the same codeword to both paths i times, will not 
increase the distance between the two paths at all. Hence, d c (7]'+i) < d u for all / = 0, 1, 2, 
.... Now, since d c (rj) is monotonically increasing, then d c ( rf) < d u for all 7. ■ 



3 Note that this implies that the first codewords of both paths are of unequal length. 

Ill 



Chapter 4 - Sequential Decoding 



\ 

START 



Load stack with all pairwise 

comparisons between 
different length codewords 



Order stack according to 

1 . Minimum distance 

2. Maximum number of codewords 



d = 0 



Extend short path at top of stack, 
re-computing the new distances 



Order stack according to 

1. Minimum distance 

2. Maximum number of codewords 



T 




I 

d ~ distance of 
comparison at top of 
stack 



d is the value of the CDF for 7, where rj is 
the number of codewords in the shorter (in 
bits) of the two paths at the top of the stack 




Figure 4.2: Computing the CDF for a VLEC code 



112 



Chapter 4 - Sequential Decoding 



Figure 4.2 shows a flowchart for an algorithm to compute the CDF of a VLEC code. 
This algorithm is derived from a similar one used to compute the CDF of a convolutional 
code given by Lin and Costello [1983]. It is essentially a sequential search through the 
code tree using the stack algorithm. The stack contains comparisons between pairs of paths 
p' 0 j and . Associated with each comparison is the Hamming distance between the two 
paths H(p' Qg 9 p^ r ) and the number of codewords. This is taken to be the number of 
codewords in the shorter (in terms of bits) of the two paths. The stack is ordered after each 
iteration with the comparison having the maximum number of codewords from those 
having the minimum distance, being placed at the top of stack. The search proceeds by 
extending each time the shorter of the two paths at the top of the stack. The algorithm ends 
when the paths at the top of the stack merge, i.e. when they contain the same number of 
bits. This condition is never reached, however, for the case of sequentially catastrophic 
codes, defined in Section 4.4.2. In this case, some other criteria must be used to stop the 
algorithm, such as a maximum number of codewords in the comparison at the top of the 
stack. 

4.4.2. Sequentially Catastrophic VLEC Codes 

The CDF does not attain the bound given by expression (4.10) only for certain 
classes of VLEC codes. For example, for the catastrophic code Cs given in Table 3.2, d u - 
dfrec = 4 but d c (rj) < 2. In fact, it is quite obvious that any catastrophic VLEC code as 
defined in Section 3.4.3 will have d c (<x>) < d u . However, there are other codes which 
exhibit this behaviour. For example code C 6 given in Table 3.3 is not catastrophic as 
defined in Section 3.4.3, however, for this code d c (7j) < 2 for all 77, even if d u = 4. As we 
shall see, this may give problems when sequential decoding VLEC codes. 

Definition 4.3: A VLEC code with d c (oo) < d u is defined to be sequentially 
catastrophic. 

Suppose we transmit a VLEC code C over the BSC with cross-over probability p. 
Further, assume that each codeword occurs with equal probability X > where s is the 
number of codewords in C. As usual, let L a be the length of the maximum length 



113 



Chapter 4 - Sequential Decoding 



codeword. Suppose that the metric of the correct path at some state is Mo and that due to 
channel errors, the next path to be extended is incorrect due to the fact that the metric M2 of 
this path is larger than the metric M\ of the correct path, as shown in Figure 4.3. Assume 
that there are no further errors on the channel and that the simplified metric for the stack 
algorithm given by equation (4.4) is being used. Now, if the metric of the incorrect path 
remains always greater than M\, the correct path will never be extended, causing decoding 
errors 4 . The change in the path metric, AM, after each path extension is given by 

AM= //log/? + (/-//) log (1 -/?) + log y s -/logVi (4.11) 
where / is the length of the codeword used in the path extension and H is the Hamming 
distance of this codeword to the correct path. Since / depends on the decoded codeword, 
then it is useful to bound AM. For 0 <p < 0.5, 

AM < H log p + (L<r— H) log (1 -p) - log s-Lcr log 14 (4. 12) 



Correct Path 




M2 Incorrect Path 

Figure 4.3: Necessary condition for correct decoding with sequential decoding 



Hence, if AM > 0, then the incorrect path will be extended at the expense of the 
correct path. Therefore, using the bound given by expression (4.12), incorrect decoding 
occurs if 

H\og p + (L a -H) log (1 -p)-logs-L <7 log 1 /2> 0 
h log{42(l- J p)]- £ -} 



(4.13) 



4 This is not a necessary condition, though, since what is really required is that the metric along the incorrect 
path is always above the metric of the correct path at some arbitrary position. 

114 



Chapter 4 - Sequential Decoding 



Hence, if the CDF for a code does not increase by more than the RHS of expression 
(4.13), then, for the given cross-over probability, sequential decoding for this code may 
perform worse than maximum likelihood decoding. 

4.4.3. Simulation Results 

Code C9 (1@5,-; 1@8,-; 5, 5) given in Table 4.1 is a sequentially catastrophic code 
with free distance 10. This is clear from Figure 4.4, which gives the CDF for the code, 
since the CDF never attains the value of the unequal length free distance (which in this case 
is the same as the free distance, since s = cr= 2). Code C\o (l@5,s 1@8,-; 4, 3) is also a 
free distance 10 VLEC code. However, in this case, the code is non-catastrophic, and any 
path with five codewords is at least at Hamming distance 10 from all other (longer) paths 
in the tree. Note that with the same codeword lengths, a free distance 12 code may also be 
constructed. This is code C\\ (1@5,-; 1@8,-; 5, 5) also given in Table 4.1. For this code, 
any path with seven codewords is at least at Hamming distance 12 from all other (longer) 
paths in the tree. 




Number of Symbols, tj 



Figure 4.4: Column distance function for the three codes C9, C\ 0 and C\\ 

115 



Chapter 4 - Sequential Decoding 



Source symbol 


Code C 9 


Code Cio 


CodeCn 


a 
b 


00000 
11111111 


00110 
10001011 


01101 
10010010 



Table 4.1 : Two-codeword codes with average codeword length of 6.5 bits for a uniform 

source 



Figure 4.5 shows the performance curves for these three codes with a uniform source, 
both with maximum likelihood decoding and with sequential decoding using the stack 
algorithm, with a maximum stack size of 100. Notice that the performance of Co, which is 
sequentially catastrophic (but not strictly catastrophic), with sequential decoding is worse 
than that with maximum likelihood decoding. This difference translates in a 2dB loss in 
coding gain with sequential decoding at a SEP of 10" 3 . 

With maximum likelihood decoding, however, codes C9 and Cio have practically the 
same performance, which shows that sequential catastrophicity is not important with 
maximum likelihood decoding. What is important with maximum likelihood decoding is 
the free distance, hence the performance of C\\ is better than that of the other two. This is 
also true with sequential decoding. 

Codes Cio and Cn are not sequentially catastrophic and for both these codes, the 



le-01 



1e-02 



I 

2 le-03 

g 

2 le-04 
o 

H 

le-05 



le-06 




■ Maximum Likelihood Decoding C9 

— Sequential Decoding C9 
-© — Maximum Likelihood Decoding Cio 

O Sequential Decoding Cio 
-B — Maximum Likelihood Decoding Cn 

— Sequential Decoding C\ 1 



0.01 



0.1 



Cross- Over Probability, p 



Figure 4.5: Comparing performance of maximum likelihood and sequential 
decoding for codes C9, Cio and Cn 



116 



Chapter 4 - Sequential Decoding 



performance curves given in Figure 4.5 show that sequential decoding is asymptotically as 
good as maximum likelihood decoding. Table 4.2 gives the required CDF growth for these 
codes, for specific values of p, for the condition given by expression (4.13) to be satisfied. 
Now, from Figure 4.4, the CDF growth for both Cio and C\\ is approximately one per 
source symbol. Hence, this requires p to be smaller than 0.01. Note, however, that 
whereas code's C\\ performance with sequential decoding is still slightly worse than with 
maximum likelihood decoding for the range of p shown in Figure 4.5, as expected, that of 
Cio is practically the same. Therefore, care should be exercised when applying the 
condition given by expression (4.13), due to the simplifying assumptions made in deriving 
this expression. 



Cross-over probability, p 


Required CDF growth 


0.2000 


2.2123 


0.1000 


1.8246 


0.0100 


1.0384 


0.0010 


0.7013 


0.0001 


0.5267 



Table 4.2: Required CDF growth to satisfy condition given by expression (4.12) for codes 

C9, Cio and C\\ 

As a further example, consider codes C15 and C\j given in Table A.2. Their 
respective CDFs are given in Figure 4.6, from which we may deduce that Cn has free 
distance 5 (using also Corollary 4.1) and that C\ 5 is sequentially catastrophic. In fact, C\$ 
is strictly catastrophic, since the two semi-infinite messages jazazaz... and zhfhfhf... are at 
Hamming distance 3 from each other. Figure 4.7 shows the codes' performance both with 
sequential and maximum likelihood decoding. First of all, it is interesting to note that code 
Cis, which is catastrophic, with maximum likelihood decoding performs almost as well as 
C17, which is not catastrophic. With sequential decoding, however, C15 performs worse 
than with maximum likelihood whereas the performance of code C17 is practically the same 
with both decoding methods. It is worthwhile noting, though, that the difference in 
performance for Cis is negligible. 



117 



Chapter 4 - Sequential Decoding 



Figure 4.8 shows the effect of the stack size for code Cn given in Table A.2. Note 
that for a stack size of just one, the stack algorithm is equivalent to the tail decoding 
algorithm Using the Massey metric as given in Section 2.8. Notice that with a stack size of 
just fifteen, the performance of the stack algorithm is almost as good as that of maximum 




Number of Symbols, 77 



Figure 4.6: Column distance function for the three codes C15, C\e and Cn 
1e+00 1 



ie-01 



•§ ie-02 
p 



o le-03 

w 

2 le-04 
S 

GO 



Ie-05 



le-06 
0.001 




Maximum likelihood 
decoding for code C !5 

Sequential decoding for 
code C 15 

Maximum likelihood 
decoding for code C n 

Sequential decoding for 
code C 17 



0.01 



0.1 



Cross- Over Probability, p 

Figure 4.7: Comparison between maximum likelihood and sequential decoding for 

codes C\s and C17 



118 



Chapter 4 - Sequential Decoding 




le-06 1 1 

0 2 4 6 8 

E b INo (dB) 



Figure 4.8: Effect of stack size on performance of sequential decoding for code Cn 



likelihood decoding. In fact, for all non-sequentially catastrophic VLEC codes considered, 
it has been found that for a relatively small stack size the performance with the stack 
algorithm is nearly as good as that with maximum likelihood decoding. 

Figure 4.9 shows the performance of sequential decoding for code Cn with a 
maximum stack size of 50 using both the full metric as given in equation (4.3) and the 
approximate metric as given in equation (4.4). Notice that in this case the performance is 
practically the same with both metrics. The main reason for this is that the probabilities of 
a '0' and a T for code Cn when used to encode the 26-symbol English source are nearly 
equal. In fact, in this case, Q(0) = 0.5457 and Q{\) = 0.4532. If, on the other hand, we 
were to consider code C9 given in Table 4.1 with a uniform source, where Q(0) = 0.3846 
and Q(l) = 0.6154 we would expect that the performance with sequential decoding using 
the full metric should be better than with the simplified one. In fact this is the case in 
practice, as shown in Figure 4.10. 



119 



Chapter 4 - Sequential Decoding 



le+00 




le-06 1 1 

0 2 4 6 8 

EbINo (dB) 



Figure 4.9: Performance of Cn with exact and approximate metrics 



le-K)0 




120 



Chapter 4 - Sequential Decoding 



4.5. Complexity 

The main objective for developing sequential decoding for VLEC codes is to reduce 
the decoding complexity. This is, however, not as simple to analyse as in the case of the 
maximum likelihood decoding algorithm, since the number of paths searched is a random 
variable dependent on the code used and the state of the channel. In the case of 
convolutional codes, the computational effort has been upper bounded by Chevillat and 
Costello [1978]. This has been found to be exponentially dependent on the column 
distance growth of the code. The more rapidly the column distance grows, the less 
computational effort required to decode the code. This has also been previously verified 
experimentally [Chevillat & Costello, 1976]. Other earlier work used random coding 
arguments to derive an upper bound on the computational effort on the whole ensemble of 
random tree codes [Savage, 1966a], [Savage, 1966b], [Jacobs & Berlekamp, 1967], 
[Forney, 1974]. This work has shown that the computational effort is Pareto distributed 
and that it does not depend on the constraint length of the convolutional code. Although no 
similar analysis has been attempted for sequential decoding of VLEC codes, it is 
reasonable to expect that these will exhibit similar properties. However, a few comments 
are in order. 

Figure 4.1 1 shows the average number of extended paths per transmitted symbol as it 
varies with the SNR on the channel for codes C 9 , Cio and C n given in Table 4.1, using a 
maximum stack size of 100. Ideally, only one path is extended per transmitted symbol. 
However, in practice this is not possible. The first point to note is that since these codes 
have two codewords (s = 2), then the stack algorithm requires that for each path extended, 
two new paths are generated 5 . Hence, the minimum number of extended paths for each 
transmitted symbol, if the correct path always stays at the top of the stack, is two. This 
occurs when the SNR is high, resulting in few errors on the channel and provided that the 
metric of the correct path increases at each extension. For the simplified metric given by 
equation (4.4) this implies that, in the limit when p — > 0, we require that 



In general, s new paths are generated. 

121 



Chapter 4 - Sequential Decoding 




3 4 5 6 7 8 9 

E b IN 0 (dB) 

Figure 4.1 1 : Number of extended paths for codes C9, C10, and C\\ 

h > log 2 



10 



1 



(4.14) 



For codes C 9 , Ci 0 and Cn with a uniform source, condition (4.14) is satisfied. 
Hence, the minimum average number of extended paths per transmitted symbol is indeed 
two, as can be observed from Figure 4.11. Notice that lowering the SNR increases the 
number of paths that are extended for each transmitted symbol. Figure 4.12 presents a 
clearer picture. Here, the average number of extra paths which are being extended per 
source symbol more than the minimum number, in this case two, is plotted against Et/N 0 . 
Denote this number by E x . From this figure we may deduce that there is an exponential 
relationship between the number of extended paths and the value of EblN Q \ i.e., E x « 
a(Eb/No) b , for some constants a and b. 

Assuming that all codewords are considered at each path extension 6 , then at each 
path extension we need to perform ^ /, additions to evaluate the new metric for each 



6 This is not exactly true if it is ensured that the decoded message has the same number of bits as that 
transmitted, since in this case some of the states in the trellis are forbidden and consequently the 
corresponding transitions are not extended. 



122 



Chapter 4 - Sequential Decoding 




extended path. Hence, if we let £ represent the average number of paths which visit the top 
of the stack per transmitted source symbol, then 

s 

Average number of additions per source symbol » (4.15) 

/=i 

Note that as EtfN 0 -> «>, 1 and that 

£ = ^ + 1 (4.16) 
s 

In addition, for each path extension, we need to store in the stack that codeword with 
the best metric for each different codeword length group. Hence, we need to perform s - cr 
comparisons each time that a path is extended. Therefore: 

Average number of comparisons per source symbol « - a) (4. 1 7) 

Furthermore, each time that a path is extended, <x new paths need to be inserted into 
the stack in the correct order according to their metric value. However, various techniques 
may be used to simplify this operation by avoiding actually sorting the entries in the stack. 
One such technique for the case of convolutional codes is the so-called stack-bucket 



123 



Chapter 4 - Sequential Decoding 



algorithm [Jelinek, 1969]. For convolutional codes this technique has little effect on the 
performance of the stack algorithm. Even though this was not tested for VLEC codes, it is 
expected that the behaviour should be similar. In addition, from the results obtained, some 
of which presented in Section 4.4.3, it has been found that the size of the stack for the case 
of VLEC codes need not be large. Hence, the amount of computation in the sorting of the 
stack can be made relatively small. Therefore, as a first order comparison with the 
modified Viterbi decoding algorithm for VLEC codes, we shall neglect the computation 
required in sorting the stack and will just compare the number of additions and 
comparisons necessary. Hence, comparing equations (4.15) and (4.17) with their 
respective counterparts for maximum likelihood decoding given by equations (3.48) and 
(3.47), we may deduce that sequential decoding of a VLEC code is less complex than 
maximum likelihood decoding if 

^ < 4v«Bg L (41g) 
g 

where /.average 1S th e ayerage codeword length and g is the gcd of the codeword lengths. 

For codes C9, C10 and Cn with a uniform source, L^^Jg = 6.5. So, for sequential 
decoding to be less complex than maximum likelihood decoding, £ must be less than 6.5. 
Hence, from the curves shown in Figure 4.12, for EblN 0 greater than 3dB for codes C9, C\o 
and C\u sequential decoding is less complex than maximum likelihood decoding with a 
uniform source. Notice that for code C9 which is sequentially catastrophic, the average 
number of extended paths for any given Et/N a is larger than for the other two codes. Note 
also that the number of paths extended for C\o and C\ \ are approximately the same and this 
is in conformity with the fact that the CDF growth for these two codes is approximately 
equal. 

Consider now codes Q5, Ci6, and C17 given in Table A.2 used to encode the 26- 
symbol English source. The respective CDFs are given in Figure 4.6. The variation of E x 
with EblN 0 for these codes is given in Figure 4.13 for a maximum stack size of 50. From 
this figure it is surprising to note that there is almost no difference between the values of E x 
for all three codes, even though these codes have widely differing column distance growth 



124 



Chapter 4 - Sequential Decoding 



le+03 




01 2345678 

EbINo (dB) 



Figure 4.13: Extra extended paths for codes C\$, Ci6, and Cn 



rates and in fact code C\s is even catastrophic. This seems to imply that the CDF growth 
for the case of VLEC codes is not enough in itself to determine which code is less 
computationally expensive with sequential decoding. 

For these codes, sequential decoding is less complex than maximum likelihood 
decoding if £ is greater than about 8.5, or E x greater than 195. Hence, from Figure 4.13, 
this occurs when EtJN 0 is greater than ldB. Table 4.3 gives the computational load for 
these codes with sequential decoding as a percentage of that required with maximum 
likelihood decoding, where the computational load is the sum of the total number of 
additions and comparisons required per source symbol. Notice that for reasonably 
moderate values of Eb/N QJ the computational load required by the sequential decoding is 
about a tenth of that required by the modified Viterbi decoding algorithm. 

4.6. Conclusion 

In this chapter we have given a sequential decoding algorithm, based on the stack 
algorithm, for VLEC codes. We have seen that the performance of VLEC codes with 
sequential decoding is as good as that with maximum likelihood decoding, for non- 



125 



Chapter 4 - Sequential Decoding 



Eb/N 0 


C\$l% 


C\(J% 


Cn/% 


1 


106.1 


108.2 


115.7 


2 


40.6 


41.9 


43.8 


*> 
3 


19.5 


20.3 


19.9 


4 


13.7 


14.0 


13.7 


5 


12.1 


12.1 


12.0 


6 


11.6 


11.7 


11.5 


7 


11.5 


11.5 


11.4 


8 


11.5 


11.5 


11.4 



Table 4.3: Percentage computational load for codes C15, C\ 6 and Cn with the sequential 
decoding algorithm as compared with the modified Viterbi algorithm 



sequentially catastrophic VLEC codes. However, the complexity for this algorithm can be 
much less than that required with maximum likelihood decoding, especially for high SNR. 
Even for Et/N 0 as low as 3dB, equivalent to SEP as high as 0.01, sequential decoding is 
already less complex than maximum likelihood decoding for the codes considered here. 

The metric used in the sequential decoding algorithm is essentially the same as the 
Fano metric used for convolutional codes. A simplified metric was shown to give 
comparable results to the exact metric on condition that the channel input symbols occur 
with approximately equal probabilities. This condition is usually met in practice, since 
otherwise it would imply that the code is not efficient for the given source. 

One interesting aspect found from the experimental results was that the maximum 
stack size required for near optimal performance is small. A stack size of twenty entries 
was enough for most codes considered, sometimes even less. As a rule of thumb, the stack 
size must be chosen to be slightly larger than a. This is because the free distance paths are, 
in general (for non-sequentially catastrophic codes), very short. For instance, for code Cn, 
where a maximum stack size of fifteen is enough to give near optimal performance, 97% of 
the free distance paths contain two codewords or less. This is advantageous since less 
computational effort will be required to sort the stack. 

The column distance function growth, although it does give indications as to whether 
a code will perform well with sequential decoding, on the other hand does not seem to be a 
very good test to indicate the computational effort required with sequential decoding, 
unlike the case with convolutional codes. One problem with the CDF is that it does not 

126 



Chapter 4 - Sequential Decoding 



take into account the probabilities of the paths involved. Hence, if the code is catastrophic 
only for one particular message, then the probability of having catastrophic behaviour is 
likewise small and thus sequential decoding would perform well in this case. It would be 
useful to bound the computational effort required in sequential decoding of VLEC codes. 
This will then indicate which parameters of VLEC codes are most important in this regard. 

Another problem with sequential decoding, which was not addressed in this chapter, 
is the random time required to decode a source symbol. This is a problem with any 
sequential decoding algorithm, even for convolutional codes. The fact the stack size is 
small somewhat alleviates this problem. 



127 



Chapter 5. 

Synchronisation Properties 



5.1. Introduction 

Loss of synchronisation is said to occur when the decoder does not properly 
determine the codeword boundaries (see also Section 2.4). In this Chapter we shall be 
considering four different mechanisms whereby we may have loss of synchronisation in 
VLEC codes with maximum likelihood decoding. In the first case, considered in Section 

5.2, we are going to assume the usual channel model whereby the number of received bits 
is equal to that transmitted (the BSC). For the other three cases, we will use a different 
channel model which allows symbol deletions or insertions. The three cases that we will 
consider are those when: a number of initial bits are lost (Section 5.3); a number of 
consecutive bits are lost within the message (Section 5.4.1); and a number of consecutive 
(random) bits are gained within the message (Section 5.4.2). 

5.2. Average Error Span on the Binary Symmetric Channel 

Consider a VLEC code C (s\@L u b\; s 2 @L 2 , b 2 ; •••; Sa@L<» b^ d miTX9 c m i„) 
transmitted over the BSC with cross-over probability p and decoded using the maximum 
likelihood decoder of Chapter 3. Using the same notation as introduced in that chapter, 
consider an error event, whereby the correct path segment p' q/ . associated with the message 
sub-sequence a, is incorrectly decoded into the path segment p J q7 associated with the 
message sub-sequence a,. If |[p^ r || = \p{\ = * t ^ ien error event causes no loss of 
synchronisation, since the transmitted codeword is incorrectly decoded into a same length 



128 



Chapter 5 - Synchronisation Properties 

codeword. However, if |p^ r |*l and/or |^ r ||*l then this would indicate loss of 
synchronisation. The number of source symbol errors in an error event is simply |p^ r | . 
We define the average error span, E S9 of a VLEC code over the BSC as the average number 
of source symbol errors in an error event, where the average is taken over all error events. 
Similarly, we define the average effective error span, E StB? of a VLEC code as the average 
Levenshtein distance Z,(a„ a y ). 

Lemma 5.1: The average effective error span of a VLEC code is given by 

where P(E) is the error event probability and P S (E) is the SEP as calculated using the 
algorithm given in Section 2.9.2. 

Proof: From the algorithm given in Section 2.9.2, 

/>,(£) =i^-_ (5.2) 

W 

where a, and a, are, respectively, the correct and incorrect message sub-sequences 
corresponding to an error event and a, is the transmitted message sequence. The sum is 
taken over all error events. Whereas the error event probability is given by 

Total number of error events 
P(£) = : — ; (5.3) 

Kl 

Hence, dividing equation (5.2) by (5.3) 

Yi(a ; ,a,) 
P(F\ — J 

= ^ = E S (5.4) 

P(E) Total number of error events " 

m 

Therefore, the synchronisation properties of VLEC codes over the BSC may be 
determined from the error event and the symbol error probabilities. Unfortunately, the 
upper bounds on P S (E) and P(E) cannot be applied to bound E Stsr However, we may obtain 
a useful relationship for the average effective error span of a VLEC code for very small p. 
This is achieved in Theorem 5.1 . However first we need the following lemma. 



129 



Chapter 5 - Synchronisation Properties 



Lemma 5.2: The error event probability of a VLEC code at high SNR is 
approximately given by 

where Ad^ is the average number of converging pairs of paths at free distance d^ and 
Pdte is the probability of decoding a path into another one at distance over the BSC. 

Proof. The proof is similar to the one given for Corollary 3.1 on page 90 and is thus 
omitted. In this case, the upper bound on the error event probability given by expression 
(3.31) must be used. ■ 

Theorem 5.1: The average effective error span of a VLEC code with free distance d^ 
over the BSC with very small p is approximately given by 

E * m -^ (5 - 6) 

where Ad^ is the average number of converging pairs of paths at free distance c/frec, and 
is the average pairwise Levenshtein distance of these paths. 

Proof. The required relationship is obtained by substituting for P(E) and P S (E) in 
equation (5.1) using the approximate relationships given respectively by expressions (5.5) 
and (3.37). ■ 

Note that the minimum possible value for E Se{{ (and E s ) over the BSC is one symbol, 
signifying that in an error event there is a single symbol decoded in error (otherwise there 
would be no error event) but no synchronisation loss. In fact, for fixed-length codes (over 
the BSC) E s = E Scff = 1 symbol always. 

In order to obtain similar expressions for the average error span, we define the 
average number of source symbols in all converging pairs of paths whose encoded 
messages are at a Hamming distance h from each other, Ch 9 as follows 

Q=£ ZKkM (5-7) 

k=l (iJ)eG 0Jt 

where p' o k is the z'th path from state Sq to state 5*, having |pi,*|| source symbols and 
probability P(Po k ), G 0 jt is the set of all pairs of path indices corresponding to paths which 
diverge at state So and merge again for the first time at state and H(p' o k , p J Q k ) is the 

130 



Chapter 5 - Synchronisation Properties 



Hamming distance between the encoded paths p' 0 k and p J Q k . The values for C* may be 
determined using the same algorithms as those given in Section 3.5.1 .1 . 

Theorem 5.2: The average error span for a VLEC code with free distance over 
the BSC for very small p is approximately given by 

E,*^*- (5.8) 

Proof. The proof is similar to that of Theorem 5.1 and is omitted. ■ 
The definition for the average error span as p—>0 is the same as that given by Maxted 
and Robinson [1985] for (non-error-correcting) exhaustive variable-length codes under 
single-bit errors, since for these codes a single-bit error will always cause an error event. 
Indeed, for the special case of exhaustive variable-length codes with ^frcc= v l 5 
approximations (5.6) and (5.8) become equalities, since in this case all possible error 
events are accounted for by paths contributing to A dfs ^. Using (5.8), the same values for E s 
as derived by Maxted and Robinson [1985] using state model techniques, for various 
Huffinan codes are obtained. For instance, for code C i2 given as Code 1 in Table VI in 
their paper, reproduced here in Table 5.1, ^4i=2.200, Z?i=3.574 and Ci=3.7625. Hence, 
using expression (5.8), E s - 1.7102, which is the same as that calculated in the above 
mentioned paper. 



Symbol 


Probability 


Code Cn 


a 


0.4 


01 


b 


0.2 


00 


c 


0.2 


11 


d 


0.1 


100 


e 


0.1 


101 



Table 5.1 : Huffinan code given in Maxted and Robinson [1985] 

As an example of the variation of the average effective error span with the value of 
the cross-over probability over the BSC, we give in Figure 5.1 that for code Cn with free 
distance five, given in Table A.2, when used to encode the 26-symbol English source. For 
this code, /t 5 =2.3076, 5 5 =3.8543 and C 5 =3.6209, hence for small /?, E S(tS « 1.67. We note, 

131 



Chapter 5 - Synchronisation Properties 
5.0 j 1 




1.5 1 1 

0.001 0.01 0.1 1 

Cross-over probability, p 
Figure 5.1 : Variation of average effective error span with cross-over probability for d 7 

unfortunately, that the approximation given by (5.6) in this case is not very accurate. This 
is because we are ignoring paths which are at distance df^ + 1, which, since in this case 
dftee is odd, have the same probability of being decoded in error as those at distance d^. 
The interactions between these paths, however, is not easy to analyse. 

For all VLEC codes considered, the average (effective) error span increased with 
increasing p\ however, for certain non-error-correcting variable-length codes this behaviour 
may be inverted. For an example of such a code see Rahman and Misbahuddin [1989]. 

Care should be exercised when using expressions (5.6) and (5.8) for E s and E SeSr For 
instance, for code C3 given in Table 2.3 (with any source), for very small p, E s = E Seff = 1 . 
This of course does not mean that this code is immune to loss of sync. Rather, as p 
becomes very small, when an error event occurs, with large probability this will be caused 
by the incorrect decoding of a codeword into one of the same length. This is brought about 
by the fact that the free distance for this code is three, but the unequal length free distance 
is five. Hence, the probability that an error event consists of more than one source symbol 
is negligibly small compared with that for single source symbol error events for very small 
p. Note, however, that the overall performance of the code depends also on P(E). 

132 



Chapter 5 - Synchronisation Properties 



5.3. Synchronisation Recovery without Start of Message 

When the channel alters the number of received bits from that transmitted, one of the 
main assumptions for maximum likelihood decoding used in Chapter 3 will no longer be 
valid, since knowing the number of received bits N will no longer be enough to determine 
the set of possible paths transmitted. In fact, under this channel behaviour it is certain that 
the modified Viterbi decoding algorithm for VLEC codes will make a decoding error. 
However, it will be interesting to investigate the effects of these types of channel errors on 
the performance of this algorithm. 

The first case we are going to consider in this section is that when the initial bits in 
the transmitted message are lost, such as when the receiver is not initially synchronised 
with the transmitter and there are no special symbols indicating the start of message. 
Another example is when the receiver starts decoding in the middle of a message. In what 
follows we are going to assume that there are no further errors on the channel during the re- 
synchronisation period. 

Suppose that n bits are lost at the start of the message. Since the decoding algorithm 
does not attempt to correct insertion and/or deletion errors, then we will consider decoding 
errors only from the first complete codeword received. If the number of bits lost constitute 
an exact sequence of codewords, then obviously under the assumption that no further errors 
occur on the channel, there will be no synchronisation loss and hence no decoding errors as 
far as we are concerned 1 . Hence, without loss of generality, we are going to consider that 
the n bits lost are contained within a single codeword. Let the length of this codeword be /. 
If the decoder knows that it is still acquiring synchronisation, then it does not make sense 
to assign a metric of zero to state So and minus infinity to all the others as required by the 
modified Viterbi algorithm. Rather, the decoder must assign a metric zero to all initial 
states, since it does not know which of these corresponds to the first received bit. Starting 
with this assumption, we can prove the following strong result. 



Here we are assuming that the VLEC code itself is uniquely decodable and has a finite decoding delay. 

133 



Chapter 5 - Synchronisation Properties 



Theorem 5.3: For a non-sequentially catastrophic VLEC code C (s\@L\, b\\ S2@I<2, 
b 2 \ Sa@la> b a \ ^min, c min ) with gcd(Zi, Z, 2 , — , L a ) = 1 and c m in * 0, the modified Viterbi 
algorithm (with all initial path metrics set to zero) will synchronise on the first complete 
codeword received, provided that there are no further channel errors. 

Proof. Suppose that the number of bits lost from the first incomplete codeword 
received is n y where 0 < n < L a . Note that if n = 0, the first bits received will in fact form a 
complete codeword. Now, if n is not a multiple of gcd(Zi, Z,2> £<r), then the received 
sequence is in a state which does not exist in the trellis. Hence in this case the decoder will 
never synchronise 2 . Therefore, in order to ensure that the received sequence is in a state 
which does exist for any value of «, we need that gcd(ii, Z 2 , • • LJ) = 1 . 

Now, suppose that the decoder will never synchronise with the original message'as 
shown in Figure 5.2. It is given that both the received and the decoded sequences are 
assigned an initial metric of zero. Furthermore, since we are assuming that there are no 
further channel errors, then the final metric on the correct path will be zero. Hence, the 
only way a maximum likelihood decoder for VLEC code will make an incorrect decoding 
is when the incorrect path also has a final metric of zero. If the two paths are assumed to 
be infinitely long, then this could only happen if the two paths, when padded with a finite 
number of appropriate codewords at both ends so as to make them of equal length, are at a 
finite distance from each other. However, since the code is non-sequentially catastrophic 



/ Metric = 0 



1 












, n' , 









• • • Received sequence 

• • • Decoded sequence 



Metric = 0 - Codewords 

Figure 5.2: Incorrect synchronisation after n initial bits lost 



2 This is the case, for instance, with fixed-length codes, where, if the number of lost bits is not a multiple of 
the codeword length, a maximum likelihood decoder will never synchronise by itself. 



134 



Chapter 5 - Synchronisation Properties 



this is impossible, so at some point the decoder must re-synchronise. So, at least, the 
decoder has a finite synchronisation delay. However, at the point of synchronisation, the 
final two codewords in the two paths must be at least distance c m i n from each other. 
Therefore for non-zero c m i n , the out-of-sync path cannot have a metric zero, and hence the 
decoder will choose the correct path and will synchronise on the first complete codeword 
received. ■ 
If the decoder does not know that synchronisation has not been acquired and tries to 
normally decode the received message when n bits are lost at the start of the message, then 
the initial metrics assigned to the trellis' states are as given in the modified Viterbi 
algorithm in Section 3.2.2.2, i.e. state So assigned metric 0 and all other states minus 
infinity. In this case Theorem 5.3 does not hold and the synchronisation delay is longer. 
Figure 5.3 gives the average effective error span for code Cm given in Table A.2 when used 
to encode the 26-symbol English source, under this condition, for various number of initial 
bits lost, obtained using computer simulation. Note that in calculating the average effective 
error span, all deleted or partially corrupted message source symbols are not counted, 
however, all decoded source symbols are considered, even those relative to the possibly 
partially received codeword at the beginning of the message. From this figure we may note 



1.4 




0 5 10 15 . 20 25 30 35 40 45 50 
Number of Initial Bits Lost, n I bits 
Figure 5.3: Effective error span for code Cn with normally assigned initial path metrics 

135 



Chapter 5 - Synchronisation Properties 



that starting from a random position from within the message results in an average effective 
error span of just 1.2 symbols. In this case the synchronisation process is similar to when 
bits are lost within the message. Hence an explanation of the synchronisation mechanism 
will be deferred to the next section. 

5.4. Synchronisation Recovery on Channels with Symbol 
Deletions and Insertions 

In the previous section we considered the special case when the decoder is initially 
out of synchronisation. A more general model for the channel is one which allows channel 
symbol deletions and insertions. As was discussed in Chapter 2, these kinds of errors have 
a disastrous effect on fixed-length error-correcting block codes. They could also present 
problems in the case of convolutional codes, however here the decoder can recover 
synchronisation with minimal increase in complexity using some clever techniques (see for 
example Sodha and Tait [1992] ). In any case, however, for both block and convolutional 
error-correcting codes the decoder must be modified so as to be able to recover 
synchronisation after a channel symbol deletion or insertion. As we shall show empirically 
here, this is not necessary in the case of VLEC codes. 

5.4.1. Symbol Deletions 

Figure 5.4 shows the variation of the average effective error span with the number of 
consecutive bits deleted after 200 bits are received, for the codes C\$, C\e and C\y given in 
Table A.2 when used to encode the 26-symbol English source. This was obtained using 
computer simulation. Again note that the average effective error span includes all the 
incorrectly decoded symbols, including the ones relative to incomplete codewords 
received, but not the message source symbols which are not completely received. Notice 
that the maximum average effective error span is just over 1.6 symbols. Considering the 
fact that these codes and the decoding algorithm were not designed to combat deletion 
errors, their performance is extremely good. Surprisingly, the performance of the 



136 



Chapter 5 - Synchronisation Properties 




Number of Consecutive Bits Deleted/Bits 

Figure 5.4: Performance for codes C15, Ci6, and C17 for a number of consecutive bits 

deleted after bit position 200 

catastrophic code C15 is as good as that for Cn, which has a good CDF growth rate (see 
Chapter 4). 

The re-synchronisation process for a VLEC code C (si@Lu b\\ S2@Lz 9 62; 
So@L<» be?; d m in, Cmin) with a maximum likelihood decoder may be explained as follows. 
Consider first a single bit deleted from a codeword of length /, resulting in a received word 
Wrf of length /-l. If the gcd(Zi, Z2, La) * 1, then it is impossible for the decoder to 
regain synchronisation, as was explained in the previous section. Hence, for automatic re- 
synchronisation it is required that gcd(Zi, L 2 > Lo) = 1. If /-I = L i9 for some i = 1, 2, 
cr, then the decoder may decode w</ into a single codeword c</, with |cj =» l-l, resulting in 
no synchronisation loss, as shown in case (a) in Figure 5.5. Again assuming that there are 
no further errors on the channel, for this to occur we need that H{yvd> £<d be smaller than the 
distance of all incorrectly synchronised merged paths to the received one, an example of 
which is shown in case (b) in Figure 5.5. Unfortunately we cannot say much about the 
distance of a sequence like that of case (b) to the received sequence, other than that it must 
be at least distance c m i n . To ensure that w<* is some minimum diverging distance to all 
codewords of C will be too large a constraint which will result in low rate codes. For a 

137 



Chapter 5 - Synchronisation Properties 



sequentially catastrophic code it may also be possible to find an infinitely long unmerged 
path to the received sequence which is at finite distance. If this distance is less than //(w</, 
cj) 9 this means that the decoder will never re-synchronise. Clearly, therefore, if 
synchronisation is important, we should avoid using sequentially catastrophic codes. 
Conversely, if the code is not sequentially catastrophic, then the synchronisation delay will 
be bounded. Case (b) will also arise if there are no codewords of length /— 1. Hence, it is 
advantageous to have consecutive codeword lengths, i.e. L\ = L2—I = •••= L a — cr+ 1. 
Also, the larger <xis, the more probable that a codeword of length l-n exists. Therefore for 
good synchronisation properties <xmust be chosen as large as possible. 

The re-synchronisation mechanism described above is also true when the number of 
deleted bits is greater than one, except that the probability that the corrupted word Has 
length not equal to any codeword length increases. In addition, another effect starts to 
become important. When two or more bits are deleted, the probability that two adjacent 
codewords are affected increases. Hence we would expect that the average effective error 
span increases as the number of deleted bits increases. However, when L\ or more bits are 
deleted, there is the probability that the deleted bits constitute a complete codeword, again 
ensuring that there is no synchronisation loss. In addition, the actual code construction has 
a direct influence on the synchronisation behaviour. Notice from Figure 5.4 that initially 
the effective error span increases with increasing number of bits deleted, until around the 



Transmitted sequence 



Received sequence 

Decoded sequence, 
case (a) 

Decoded sequence, 
case (b) 

Figure 5.5: Synchronisation recovery under a single bit deletion 

138 



1 



Bit deleted 



Chapter 5 - Synchronisation Properties 



minimum codeword length it starts to decrease again. This will reach a minimum when the 
number of bits deleted is approximately equal to the average codeword length, which for 
codes C15, Ci6 and C i7 is approximately 8.5 bits, because the probability of deleting a 
complete codeword is highest in this case. 

Figure 5.6 shows the probability distribution of the effective error spans for codes 
C15, Ci6 and Cn when four consecutive bits are deleted after bit position 200, which for all 
three codes results in the maximum average effective error span (see Figure 5.4). 

As an example of the effect of cr on the effective error span, consider Figure 5.7 
which gives the effective error span for codes Cis and C19 given in Table A.2 when used to 
encode the 26-symbol English source. For C\s 9 cr- 3 while for C19, cr= 5. Notice the large 
difference in performance between these two codes and even between these and the 
previous codes considered, for which cr= 8, 8 and 11 respectively for C15, C\$ and C17. 

5.4.2. Symbol Insertions 

Channel symbol insertion is complementary to channel symbol deletion, whereby one 
or more random bits may be inserted within the transmitted message. The re- 




4 



C 15 



5 



6 



Effective Error Span / Symbols 



Figure 5.6: Probability distribution of the effective error span for codes Cis, Ci6, 

and Cn 



139 



Chapter 5 - Synchronisation Properties 




Number of Consecutive Bits Deleted/Bits 

Figure 5.7: Performance for codes Cis and C\9 for a number of consecutive bits 

deleted after bit position 200 



synchronisation process is very similar to that described in the previous section for symbol 
deletion. Figure 5.8 shows the average error span for the three codes C15, Ci6, and C\i 
when used to encode the 26-symbol English source, as it varies with the number of 
consecutively inserted random bits after bit position 200 within the transmitted message. 
In this case it was considered to be more appropriate to measure the average error span, 
since the number of erroneously decoded symbols will increase with the number of inserted 
bits, whereas the number of complete source symbols received is at most one less than that 
transmitted, whatever the number of (consecutively) inserted bits. Again, the minimum 
value for the average error span occurs when the number of inserted bits is approximately 
equal to the average codeword length. 

5.5. Conclusion 

Theorem 5.3 shows that the synchronisation performance of non-sequentially 
catastrophic VLEC codes with codeword lengths having no common factor and with a non- 
zero minimum converging distance is excellent when the decoder knows that it is out of 
sync, since in this case it will synchronise on the first complete codeword received 



140 



Chapter 5 - Synchronisation Properties 




Number of Consecutive Bits Inserted/Bits 

Figure 5.8: Performance for codes anH r , 

uv • \ 16 ' C ' 7 for a nu mber of consecutive 
random bits inserted after bit position 200 

(pmvided mat the received aequenco has an appropna,ely ,„„ g pe riod ^ ^ 
errors). fa addition even when «he decoder des no, know flu* .here is loss of 
synchronisation, the average error span can be hep, q ui,e small. For ft. codes considered 
here, rhe maximum average effective error span was found to be only ..6 sonrce aymbob 
Thia occurred under bit deletion emors. This 6saK ^ be kept ^ „ y fte 
code ,„ have as many consecutive codeword lengtos with no common (actor as possible 
Note that mis wffl also ailow ua to construe, good VLEC codes with codeword iengms 
ma,chmg me source shades. Cher conatiafats could be impoaed on m= VLEC codes ,o 
-prove meir synchronisadon properties, however we fee, ma, these wi„ drasticaUy 
mcreaae me average codeword length, to be effective, makfag mem impractical. 

From the resuhs present fa Ms chapttr> we tave ^ ^ ^ ^ fa ^ 

much difference fa me synchronisation performance of serially eanuarophic and n„„. 
sequentially Sophie VLEC codes. However, fa «he case of sequentially catasbophfa 
VLEC codea, for particular messages under specific error pahema, me decoder may never 
acfaeve synchronisadon. Bu, fa most caaea me probability for fteae ,o occur ia very small 



141 



Chapter 5 - Synchronisation Properties 



and hence these do not affect the performance of the code. The performance penalty for 
sequentially catastrophic codes is even less than for strictly catastrophic codes as defined in 
Section 3.4.3, because in this case, although synchronisation may be delayed, the decoder 
may still be decoding correctly even when out of synchronisation. 

Designing the unequal length free distance of a VLEC code to be greater than its free 
distance will make it more robust against loss of synchronisation, since the minimum block 
distance of a code does not influence its synchronisation properties. Hence, when 
synchronisation is of prime concern, it is advisable to make the unequal length free 
distance greater than the free distance. However we do not see this as advantageous as 
regards to the SEP performance, especially in the case of the AWGN channel. Recall from 
Chapter 3 that the SEP performance of a VLEC code over the AWGN channel with hard 
decision maximum likelihood decoding is very much dependent on its free distance (not 
the unequal free distance). In order to increase the unequal free distance of a code, its 
average codeword length must correspondingly increase. This will offset any gain 
achieved through increase in the unequal free distance, since this has minimal effect. 

Dunscombe [1988] and Escott [1995] have shown that two-length VLEC codes with 
codeword lengths L and 2L have very good synchronisation properties over the BSC. 
However, there are two problems with these. First, two-length codes are only one step 
away from fixed-length codes, which have perfect synchronisation properties over the 
BSC. In order to really match codeword lengths to the source statistics to achieve the 
minimum codeword length possible for a given free distance, we must allow any codeword 
length. In this case the average error span may increase, but the error event probability (for 
high SNR) will be lower than for the two-length codes, giving an overall improvement. A 
comparison between the performance of these codes over the AWGN channel with hard 
decision decoding will be presented in Chapter 6. The second problem, arising on channels 
which allow symbol insertion and/or deletion, is more serious, since in this case the 
performance of these codes is as bad as for fixed-length codes, in that the GCD of their 
codeword lengths is not equal to one. Hence for two-length codes a decoder may never 
recover synchronisation by itself over these channels. 



142 



Chapter 6. 

Code Constructions 



6.1. Introduction 

In the previous chapters, we have defined some properties of VLEC codes relating to 

their performance and characteristics. However, with the exception of a-prompt codes 

discussed in Chapter 2, we have deferred any mention of how to actually construct such 

codes until this chapter. 

As with standard error-correcting codes, we can sub-divide VLEC codes into linear 

< i 
and non-linear codes. It turns out that a VLEC code can never be entirely linear. However, 

by suitably sub-dividing a code, linear sub-structures may be defined. Linearity is 

important because 

• it allows simpler encoding and decoding algorithms to be used, by exploiting the 
additional mathematical structure; 

• it simplifies code construction, first by again utilising mathematical structures and 
secondly by reducing the possible domain from where the code is chosen. 

We will basically present two construction techniques for VLEC codes. The first, 
presented in Section 6.3, uses fixed-length linear codes and anticodes to build new VLEC 
codes, whereas the second, given in Section 6.4, uses a heuristic algorithm to perform a 
computer search for good VLEC codes. 

Finally, the performance of some VLEC codes is compared to that of standard 
schemes for source and error-control coding. 



143 



Chapter 6 - Code Constructions 



6.2. Linear VLEC Codes 

Theorem 6.1: A non-trivial, uniquely decodable VLEC code is always non-linear. 

Proof: Let c, and c, be two unequal length codewords of a VLEC code C. Note that 
we can always find two such codewords in any non-trivial VLEC code C, since C must at 
least have two codewords and for it to be classified as variable-length, these must 
necessarily be of unequal length. Let |c f |=/ f and \cj\=lj. Then, for the code to be linear, 
C/+c,=0/, and cy+c/=0/, must both be codewords in C, where 0/ represents the word with / all 
zero bits. However, if both 0/, and 0/, are codewords of C, then C is not uniquely 
decodable. This can be shown to be true by considering the codeword sequence consisting 
of lj consecutive 0/, which is indistinguishable from the codeword sequence consisting of U 
consecutive 0/,. Hence a uniquely decodable VLEC code can never be linear. 

Theorem 6.1 does not exclude us, however, from using linear sub-codes, or cosets of 
linear sub-codes, within the VLEC codes. It does state, however, that these sub-codes must 
be of fixed length. Thus, we can have two different linear structures imposed on VLEC 
codes, called respectively horizontal and vertical linearity, shown diagramatically in Figure 
6.1. 

Definition 6.1: Define the vertical sub-codes of a VLEC code C (s\@L\ 9 b\\ s 2 @L 2 , 
b 2 ; — ; Sc@Zo, b^ d min , c min ), with C = {c u c 2 , — , c a }, to be V u V 2 , where V t = 

{c^ +l ■ l ^c^* n/ ^••^c' w * 1 ■ ,, }, i = 1, 2, — , <r, L 0 is defined to be equal to 0 and 
= c # c, • --C, given that c, = c # c, • • -c, , c 7 e C. Then, a VLEC code C is said to be 

J Jo Ja+\ Jb °^ J J\ Jl Jij 7 J 

vertically linear iff its vertical sub-codes V\ 9 F 2 , F^are all cosets of linear fixed-length 
codes. 

Definition 6.2: Define the horizontal sub-codes of a VLEC code C (s\@Lu b\\ 
s 2 @L l9 b 2 ; sJ&L^ ba, d min9 c min ), with C = {c u c 2 , — , c s } 9 to be H u H 2 , ~,Ha> where 
Hi = { c ?i+1 ,c yrh2 ,---,c ? . +Vj }, i = 1, 2, cr. Then, a VLEC code C is said to be horizontally 
linear iff its horizontal sub-codes H\ 9 H 2j • • H a are all cosets of linear fixed-length codes. 



144 



Chapter 6 - Code Constructions 





y i : 


^ : 


... v a 




















H 2 








: 










































Z-i bits 




■ * 




* ► 

L 2 bits 




bits 



Figure 6.1 : Horizontal and vertical sub-codes of a VLEC code 

6.2.1 . Vertically Linear VLEC Codes 

The vertical sub-codes V\ 9 V 2 , F^ as defined above are the same as the segment 
decomposition of C, {Qu Q2, Qv) as given in Section 2.6.2, i.e. V t = Q h /=1, 2, cr. 
Hence it is natural that vertically linear VLEC codes are well suited to be decoded using 
the segment decoding algorithm given in Section 2.6.3. It is interesting to note that in any 
vertical sub-code we may have repeated codewords. This is not normal for an ordinary 
fixed-length code. Here, however, the ambiguity between the repeated codewords is 
resolved by the other codeword segments in the other vertical sub-codes. 

One of the reasons for using a linear code is that it facilitates the decoding process. 
Now, we have seen that for good coding gain we must decode VLEC codes using either 



145 



Chapter 6 - Code Constructions 



maximum likelihood decoding (Chapter 3) or sequential decoding (Chapter 4). In both 
these algorithms, the parallel transitions must be resolved immediately. Hence, in order to 
simplify the decoding process for both these algorithms, it is advantageous to have a linear 
structure on the parallel transitions. Vertical linearity does not help much in this respect 
because it does not simply resolve which transition to decode for each group of parallel 
transitions. For this reason, we shall not discuss any further vertically linear VLEC codes. 

6.2.2. Horizontally Linear VLEC Codes 

If a VLEC code C (si@JL\, b\\ s 2 @L 2 , b 2 \ So@Lo> b^ d m m, c m \ n ) is horizontally 
linear with horizontal sub-codes H\ 9 #2, H<» the modified Viterbi decoding algorithm 
given in Section 3.2.2.2 may be simplified as follows. Instead of computing the branch 
metric for each codeword in C (Step 2), H u H 2 , ***, H a are used to determine the crmost 
likely codewords for each respective possible codeword length. The metrics for these a 
codewords are then computed normally and these will then be the codewords stored for the 
transitions Sr^Sf+i^j—l, 2, • ••, <x. Hence, instead of computing the metric for s codewords, 
this is now computed for cr codewords. In addition, of course, we now have to have a 
linear decoders to determine the most likely codeword for each possible length. Similarly, 
in the stack decoding algorithm for VLEC codes given in Section 4.3, H\ 9 H 2 , H a are 
used to determine the a most likely codewords (of different lengths). These will then be 
used in Step 2 to extend the path at the top of the stack. Hence in this case, only a new 
paths are generated instead of s and Step 3 becomes redundant. 

It is obvious from the definition of H\ 9 H 2 , that the minimum block distance 

bj corresponding to the codeword length Z, will be equal to the minimum distance of H*. 
Hence, if the overall minimum block distance is b m in 9 then the minimum distance for each 
one of the horizontal sub-codes must at least be equal to or greater than b m m* 

This is a 

relatively easy requirement to satisfy, since for a given number of codewords and block 
length we can determine, using tables, the minimum distance achievable using a fixed- 
length linear code. However, we also require that the various horizontal sub-codes must 
have a certain distance from each other, determined by the minimum diverging and 



146 



Chapter 6 - Code Constructions 



converging distances, d m - m and c m i n respectively. The construction algorithm given in the 
next section splits a fixed-length code into cosets with specified minimum distances and 
then uses an anticode [Farrell, 1970] [Farrell & Farrag, 1974] [Farrell, 1977] to remove 
certain columns of some of these cosets. This guarantees an overall minimum block 
distance and a minimum diverging distance, however the minimum converging distance 
necessary to give the required free distance may not always be satisfied. 

6.3. Code-Anticode Construction 

Given that we want to construct a binary horizontally linear VLEC code with s 
codewords and free distance we proceed as follows (code-anticode construction) 

1 . Find a binary linear fixed-length code <F with parameters (n, k, d) with minimum- n 
such that 2 k > s and d = d^ 9 where n is the block length, 2 k is the number of 
codewords and d the minimum distance. 

2. Letrf min = [d^/2]. 

3. Rearrange the columns of <F such that the rightmost m columns form an anticode A 
with parameters (m, k, 8) with maximum m for 5 = d^ t -d m \ n , where m is the block 
length, 2 k is the number of codewords, and 8 is the maximum distance for the 
anticode. 

4. Re-order the codewords of <Fsuch that repeated codewords of JL are consecutive. 

5. Perform simple row operations and column permutations on the generator matrix for 
<F such that the generator matrix of ^ contains the maximum number of consecutive 
0's in the top most positions of each column, starting from the rightmost column. 

6. Delete the rightmost m columns of the first codewords in % where s\ is the number 
of codewords with identical m rightmost columns. 

7. Considering the remaining codewords in <F, delete the rightmost m-1 columns of the 
next ^2 codewords in <F, where 52 is the number of codewords with identical m-1 
rightmost columns. 

8. Repeat Step 7 with the next m-2, ... rightmost columns until there are no 
further columns to delete. 



147 



Chapter 6 - Code Constructions 



Note that Step 5 will ensure that the maximum number of codewords with the same 
rightmost columns are generated, hence resulting in a VLEC code with a shorter average 
codeword length. 

Theorem 6.2: The VLEC code constructed using the code-anticode construction from 
the fixed-length code with parameters (n, k, d) and the anticode with parameters (m, k 9 S) is 
a horizontally linear VLEC code with overall minimum block distance at least equal to d 
and minimum diverging distance at least equal to d - S. 

Proof. Step 1 in the code-anticode construction ensures that all codewords are at least 
distance d from each other Now, by deleting the rightmost m columns in Step 6 of the first 
5*i codewords in <F results in two sub-codes, one of length n - m with s\ codewords (sub- 
code H\) and one of length n with s-s\ codewords (sub-code H 2 ). Since the deleted 
columns from H\ are identical (Step 6), then the codewords of H\ form a linear code with 
minimum distance at least d. Also, since H 2 is a sub-code of the original code <F (with the 
same length), then it also has minimum distance at least d. Hence the overall minimum 
block distance of the VLEC code {H\ 9 H 2 } is at least d. In addition, since the m rightmost 
columns of <F form an anticode with maximum distance 5, then by deleting the m rightmost 
columns from <F will result in a code with minimum distance d - S. Hence, the minimum 
diverging distance between the codewords of H\ and H 2 must at least be d - S. Exactly 
the same reasoning applies for all the other columns deleted. ■ 

Unfortunately, as one may observe from Theorem 6.2, this construction algorithm 
does not guarantee a minimum converging distance. Since with this construction 
^min < 6min 9 then using the bound given by expression (3.18) we require c m in = b m i n -d m in 
such that Jfree ^ *min. The required value of c m in for a given free distance may be obtained 
by adding a suitable modification vector to the code (see Section 2.4.2.1) and/or 
performing column permutations in such a way as not to affect the values of b m \ n and d m \ n . 
In general there are 2 La possible modification vectors each of length L a to test, since 
adding a modification vector of length L a to each codeword in C may affect the minimum 



148 



Chapter 6 - Code Constructions 



converging distance but not the minimum block and diverging distances 1 . On the other 
hand, the total number of column permutations possible without affecting d mm and b m \ n is 
fXljCA since only columns within a vertical sub-code may be permuted. 

However, even after performing these operations, the required converging distance still 
may not be reached. In this case the only alternative is to increase the design value of d m { n 
(in Step 2) and repeat the construction. This in general will decrease the maximum value 
of in possible for A resulting in a longer average codeword length for C. Now, however, 
the required value of c m i n is likewise decreased, making it easier to find a suitable 
modification vector or column permutations. Note that in most cases the total number of 
column permutations possible may be too large to test all, and in practice it becomes more 
feasible to find a modification vector instead. We will now illustrate the code-anticocle 
construction with an example. 

Example 6.1: Consider that we want to construct a horizontally linear VLEC code for 
the 26-symbol English source with free distance five. Therefore since s = 26, then we must 
choose k = 5. An optimum fixed-length linear code with k = 5 and minimum distance five 
has block length n = 13. The generator matrix for this code is shown in Figure 6.2. 

"1 00001010011 0" 
010000001 1101 
0010011100001 
0001001 1 1010 0 
0 0 0 0 1 1 1 0 0 1 1 0 0 

Figure 6.2: Generator matrix for (13,5,5) fixed-length linear block code 

From Step 2, the design minimum diverging distance is p§]= 3 . Hence, we need to 
find an anticode with maximum distance 5-3=2. The maximum value for m possible in 
this case is 3. This is achieved by choosing two columns of <Fand their modulo-2 sum as 
the third column [Farrell, 1977]. Three such columns in 'Fare columns 2, 3 and 13, where 



1 Note that when adding the modification vector to a shorter length codeword, only the corresponding number 
of bits from the leftmost position in the modification vector are considered. 



149 



Chapter 6 - Code Constructions 



the leftmost one is column 1. Hence, rearranging these columns to be in the rightmost 
positions, and performing simple row operations on the generator matrix of <F so as to 
arrange its codewords to satisfy Steps 4 and 5, we get the generator matrix given in Figure 
6.3. The relative codebook is shown in Figure 6.4. 



























1 


0 


0 


1 


.0 


1 


0 


0 


1 


1 


0 


0 0" 


0 


1 


0 


0 


1 


1 


1 


0 


1 


0 


0 


0 0 


0 


0 


1 


1 


1 


0 


0 


1 


1 


0 


0 


0 0 


0 


0 


0 


0 


0 


0 


1 


1 


1 


0 


1 


1 0 


0 


0 


0 


1 


1 


1 


0 


0 


0 


0 


0 


1 1 



Figure 6.3: Rearranged generator matrix for the (13,5,5) fixed-length linear block code v 
with (3,5,2) anticode in the rightmost position 



By deleting the columns indicated by Steps 6-8 in the code-anticode construction, 
shown shaded in Figure 6.4, we obtain an (8@10,5; 8@11,5; 16@ 12,5; 3,1) VLEC code, 
C13. Since the minimum converging distance in this case is one, then the free distance may 
be less than five (using the bound given by expression (3.18)). In fact, for this code the 
free distance is four. In order to increase the converging distance without affecting the 
minimum diverging distance and the overall minimum block distance we can either add an 
appropriate modification vector to all the codewords in the code, or else perform column 
permutations. In this example there are 4,096 possible modification vectors and 3,628,800 
different column permutations. In fact, adding the modification vector 11111 00000000 to 
code C13 will give the required minimum converging distance of two and hence a free 
distance of five. Figure 6.5 gives the resultant code, Cm, plus the required generator 
matrices and modification vectors for the constituent horizontally linear sub-codes. 

The encoding process is likewise simpler, because now the source symbol to be 
encoded determines which sub-code to use and the corresponding information bits are 
generated (ki, k2 or k3). These will then be used to calculate the codeword using the 
respective generator matrix and modification vector. 



150 



Chapter 6 - Code Constructions 



0 o o o o o o o o opfll 

1 0 0 1 0 1 0 0 1 r 
0 10011101 
110 110 1001 



OOlUOOllOfltt 
10 10 110 10 1 

01110111 ooiolofi 

IQlloi 



teMt 



1 

.•1.0' 

Mi 



i 



1110001111 
000000 1110 1 
10010111011 
0 1001101001 
110 11001111 
00111010001 
10 10 1110 111 

0 1110 100101 

1 1 1 000000 1 1 
000111000001 
100010001101 
010100101001 
110001100101 

001001011 oo ia 

101100010101 
011010110001 
11111111110 1 
000111111010 
1000101 101 10 
010100010010 
110001011110 
001001100010 
101100101110 
011010001010 
1 1 1 1 1 1 000 1 1 03* 



Figure 6.4: Codebook for (13,5,5) code and the derived VLEC code, Cu 



The main disadvantage of this construction algorithm is that the codeword lengths 
are not matched to the source statistics. In fact the only thing that can be done to reduce 
the average codeword length in this case is to assign the shorter codewords to the more 
probable source symbols. Using this mapping, code Cm when used to encode the 26- 
symbol English source given in Table A.1 gives an average codeword length of 10.46 bits. 
The non-optimality of such a construction may be deduced from the fact that if the shortest 
length codewords are rearranged such that the last column contains consecutive 0's and 1 's 
and if these 0's are deleted, this will give the VLEC code (4@9,5; 4@10,5; 8@11,5; 



151 



Chapter 6 - Code Constructions 



1111100000 
0 110 110011 
10 110 110 10 
H x 0010001001 
1100000 110 
0 10 10 10 10 1 
1000111100 
0001101111 



[ ki ] 



G 0 



+ [1 1 1 1100 000] 



111110 1110 1 
0 110 11110 11 
10 110 10 1001 
H 2 00100001111 
110000 1000 1 
0 10 10 110 111 
10001100101 
000 1 10000 1 1 



[ K 2 ] 



G 0 



[1111101110 1] 



H 3 



11100 100000 1 
011100001101 
101010101001 
001111100101 
110111011001 
010010010101 
1001001 10001 
000001111101 
111001111010 
011100110110 
10 10 10010010 
001111011110 
110 111100010 
010010101110 
100100001010 
00000 1000 110 



[ k 3 ] 



0000 0 



:00 
:00 

"oVV VoEVY 



+ [1 1 1 00 100 000 1] 



G 0 = 



1 0 0 1 0 1 0 0 1 1 
0 10 0 1110 10 
0 0 1 1 1 0 0 1 1 0 



Figure 6.5: VLEC code Cm (8@10,5; 8@11,5; 16@12,5; 3,2) and its horizontal linear 

sub-codes 



10@12,5; 3,2) with free distance five and average codeword length of 10.09 bits. It is also 
worthwhile to note that the design df^ is not always reached (due to the minimum 
converging distance) and it may be necessary to iterate the construction by increasing the 
design value for d min to achieve the required d^. With each iteration the average 
codeword length will be increased. 

6.4. Heuristic Construction Algorithm 

In order to solve the problems with the code-anticode construction, a heuristic 
construction algorithm was devised, which uses a computer search. The aim of this 



152 



Chapter 6 - Code Constructions 



algorithm is to construct a VLEC code with specified overall minimum block, diverging 
and converging distances (and hence a minimum value for df^) and with codeword lengths 
matched to the source statistics so as to give a minimum average codeword length for the 
specified free distance and the specified source. 

We will now describe the basic construction concept. First, a minimum codeword 
length, Lu must be specified. This must at least be greater than or equal to the minimum 
diverging distance required. Then, a fixed-length code with this length and with minimum 
distance equal to b m { n with a maximum number of codewords must be constructed. For a 
horizontally linear VLEC code, all fixed-length codes constructed with this algorithm must 
be linear or a coset. Since this construction uses a computer search, then this fixed-length 
code is also constructed using a computer search. One such search technique is the greecly 
algorithm [Pless, 1992] which will be given in Section 6.4.1. Let the code so constructed 
be C and let the number of codewords in C be s\. Next, all the possible L \ -tuples which are 
at distance d mm from the codewords of C are listed. Let this set of words be W. Obviously, 
if d m \ n > &min, then W will be empty. However, the. bound on dfr ee given by expression 
(3.18) suggests that it is best to choose b m \ n = d m [ n + c m i n . Therefore we may take d m \ n = 
T^frec/ 2 ] ^ 6 ™n = <feee. In which case, d m \ n < b min and such words are possible to find. 
So for the moment we are going to assume that W is not empty. Note that the minimum 
distance of the words in W is not specified for now. Next, the number of words in W is 
doubled by increasing the words' length by one bit by affixing first a '0' and then a T to 
the rightmost position of all words in W. So now ^contains words of length Zi+L These 
words are then checked with the codewords in C. Those words in W which satisfy the 
minimum converging distance required are retained, the others are discarded. So at the end 
of this operation, we are left with a set of words which when compared to the codewords of 
C satisfy the required minimum diverging and converging distances. The only other 
requirement left to satisfy is that these words, being of the same length, must have 
minimum distance at least equal to b m \ n . So here again we must choose the maximum 
number of words {si) from what is left in W such, that these words form a fixed-length code 
with minimum distance at least 6 min . Algorithms to achieve this are given in Sections 6.4.1 

153 



Chapter 6 - Code Constructions 



and 6.4.2 for horizontally linear and non-linear VLEC codes respectively. These words are 
then added to the codewords already in C to form a VLEC code (s\@Lu b m i n ; S2@(L\+1\ 
bmml dmm, c m \ n ). This whole procedure is then repeated by next considering all (£i+l)- 
tuples which satisfy the minimum diverging distance to all the codewords in C and are also 
at minimum distance d mm to those codewords in C of the same length, then affixing the 
extra bit to these words, extracting those words which satisfy the minimum converging 
distance and finally choosing the maximum number of words which satisfy the minimum 
block distance. This algorithm stops either when there are no further possible words to be 
found, or else when the required number of codewords is reached. 

The problem with this basic construction algorithm is that if L\ is chosen to be too 
small, not enough codewords may be found, whereas if L\ is chosen to be too large, the 
average codeword length would be non-optimal. In addition, the codeword lengths are not 
matched to the source statistics. In order to have more control of the code construction, we 
must alter the basic algorithm in two important aspects. First, when finding the maximum 
number of words which satisfy the minimum block,, diverging and converging distances 
with some given length, we must allow the possibility of eventually dropping some of these 
words. This may enable us to find more codewords of longer length than otherwise would 
be possible, hence increasing the possibility of finding the required number of codewords. 
The second alteration that is required is that some codeword lengths may be allowed to be 
skipped, i.e. we may allow the affixing of more than one bit at a time to the set Wof words 
which satisfy the diverging distance. 

Since this algorithm is general enough also to construct non-linear VLEC codes, we 
will first describe the general algorithm for any VLEC code and then point out the changes 
that must be incorporated in order to ensure that the VLEC code so constructed is 
horizontally linear. 

Figure 6.6 gives the flowchart for the full algorithm for constructing good VLEC 
codes using the basic heuristic construction algorithm highlighted above. This algorithm 
incorporates the necessary alterations discussed above in order to find the necessary 
number of codewords. So when it is no longer possible to find more words with the given 

154 



Chapter 6 - Code Constructions 



minimum diverging distance (i.e. no more codewords of longer length are possible), then 
one or more codewords from the last group of codewords with the longest length in C is 
deleted. How this is done is discussed in Section 6.4.3. But by performing this operation, 
more (longer) words may now be found which satisfy the minimum diverging distance. 
When there is only one codeword left in the last group of codewords with the longest 
length, then this is deleted from the code. The set of words W with length equal to the 
longest codeword length in C satisfying the minimum diverging distance is again derived, 
but this time more bits are affixed to these words. So if the previous time only one bit was 
affixed, now two bits will be affixed. This will increase the number of words in Why four 
times and hence more codewords satisfying the required conditions may now be found. 
The details of this algorithm are shown in Figure 6.6. 

These alterations to the basic algorithm will also make it possible to produce more 
than one VLEC code with the specified parameters and number of codewords, each one 
with a different codeword length distribution. In addition, the whole algorithm given by 
Figure 6.6 may be repeated by incrementing the starting codeword length L\ to give yet 
more VLEC codes. In order to limit this search, some maximum codeword length is also 
specified. The generation of this possible set of VLEC codes will enable us to match the 
codeword lengths to the source statistics in a rather brute force way by calculating the 
average codeword length for each code constructed and choosing that one which gives the 
minimum average codeword length for the given source. This procedure can be 
considerably sped up by incorporating the code selection process in the same construction 
algorithm, because for most of the codes constructed with this algorithm, the average 
codeword length becomes larger than for codes found earlier (or some specified value) 
even after the first few codewords are found, in which case the construction for that 
particular code is stopped and the next one considered immediately. 



155 



Chapter 6 - Code Constructions 



f Start ^ 



Generate fixed 
length code with 
min. distance b . 
and length L ( 



Delete best 
codeword from 
last group 



Derive all words satisfying 
min. diverging distance to 
current code. Call this set W 




YES 



YES 



Affix one extra bit 
to all 
words in W 




NO 



YES 



Go to previous 
group (Deleting 
last group) 



Extract all words from W 
which satisfy the minimum 
converging distance 





YES 



Go to previous 
group (Deleting last 
group) 



YES y 



Select maximum 
number of words to 
satisfy minimum 
block distance 



Derive all words satisfying 
min. diverging distance to 
current code. Call this set W 



Add words to 
code and make this 
the last group 




YES 



Affix extra bits such 
that the new group 
contains more bits 
than the old one 




Figure 6.6: Heuristic construction algorithm for VLEC codes 



156 



Chapter 6 - Code Constructions 



6.4.1. Choosing a Good Fixed-Length Coset Code from a Given Set of 
Words 

The greedy construction algorithm for a fixed-length block code with length n and 
minimum distance d is as follows. List all 2" tt-tuple words in lexicographic order. The 
first codeword chosen is the all zero codeword. Let the code thus formed be C. 2 The next 
codeword to be put in C is the next one in the list of w-tuples which is at a distance at least 
d from all the other codewords in C. This procedure is repeated for all possible ^-tuples. It 
is very easy to see that the code so constructed has minimum distance d. Pless [1992] has 
also proved that this code is linear. We shall use this algorithm later in Section 6.4.2 to 
construct non-linear VLEC codes. However, since in the construction algorithm given in 
Section 6.4 the codewords must be chosen from a fixed set of words which in general does 
not contain all possible 2 n w-tuples and since this will affect the linearity of the code, then 
we will use the following algorithm to construct a linear code or a coset. We shall call this 
the search and add algorithm. Note that we will still assume that the words in the allowed 
search space are in lexicographic order. 

1 . Pick up the first word from the allowed set as the first codeword. This will also be 
the coset leader. 

2. Go down the ordered set and find the first word at distance d or more from the 
previously chosen set of codewords. If one is not found, then stop; else proceed to 
the next step. 

3. Add (modulo-2) this word to each previously selected codeword together with the 
coset leader (i.e., three words are added together in each case). If all the additions 
result in words in the allowed set, then choose all the words resulting from these 
additions as codewords, otherwise discard these words. 

4. Repeat from Step 2 until there are no other words satisfying the required conditions. 

Theorem 6.3: The code constructed using the search and add algorithm is a coset of a 
linear code with minimum distance d. 



So initially C has just one codeword. 



157 



Chapter 6 - Code Constructions 



Proof. The proof is by induction. Obviously, when choosing the first two codewords, 
the code is trivially a coset with minimum distance d. Let <F be an intermediate code so 
constructed, and assume that it is a coset. Now Step 2 ensures that the next chosen word c 
is at Hamming distance d from all codewords in <F. Since the remaining words are 
generated through the addition of c to all codewords in <F together with the coset leader, 
then obviously the resultant set of words is also a coset (with the same coset leader). Let 
<l,6g<F, where a is the coset leader and 6 some other codeword not equal to a. Then H(a, 
6)> d. Also, H(a, c)> d and H(6, c) > d as required by Step 2. Now the set of words 
generated by Step 3 are of the form a+ 6+ c= <C But, H({£ a) = H(a+ 6+ c, a) = 
W(a+ £+ c+ a) = W(6+ c) = H(6, c) > d. Similarly, H(£ 6)>d. Hence the resultant code 
is a coset with minimum distance d. ■ 

6.4.2. Choosing a Good Fixed-Length (Non-Linear) Code from a Given Set 
of Words 

The greedy algorithm may be used directly to construct a fixed-length code of length 
n from a given set of words which may not necessarily contain all possible ^-tuples. In this 
case we will call this the restricted greedy algorithm. In fact, the code generated with this 
algorithm is, in general, non-linear. The advantage of this algorithm is that it is easy to 
implement and is very fast. The disadvantage is that the code produced is not always very 
good. 

A better algorithm is the following, which we shall refer to as the majority voting 
algorithm. This selects a good fixed-length code, with minimum distance rf, from a given 
set of words W. 

1 . Initialise the required code C to the null code (i.e. no codewords selected). 

2. For each word in W determine the number of other words in W which are at least 
distance d from the given word. 

3. Choose that word which has got the maximum number of other words which satisfy 
the minimum distance requirement. If there is more than one such word, then choose 
an arbitrary one. Put this word in the required code C. 



158 



Chapter 6 - Code Constructions 



4. Remove from W all those words which do not satisfy the minimum distance 
requirement to the chosen codeword. 

5. Repeat Steps 2-4 with the new W. Each time the selected codeword is added to the 
code C. The algorithm ends when Wxs empty. 

The main disadvantage with the above algorithm is that as the number of words in W 
increases it becomes too computationally expensive and the restricted greedy algorithm 
will then be preferred. Table 6.1 gives a comparison between the number of codewords 
generated by the greedy algorithm (GA) and the majority voting algorithm (MVA) together 
with the known maximum number of codewords possible [Mac Williams & Sloane, 1978] 
for various values of n and d. The allowed set of words in this case was all possible n- 
tuples. Note that except for the codes produced with n = 12 and 13 with d = 3, the MVA 
finds more codewords than the GA. The performance of the MVA is even better when 
compared to the restricted greedy algorithm when the set fTdoes not contain all possible n- 
tuples. 



n 


rf=3 


d=5 


d=7 


GA 


MVA 


Best 


GA 


MVA 


Best 


GA 


MVA 


Best 


5 


4 


4 


4 


2 


2 


2 








6 


8 


8 


8 


2 


2 


2 








7 


16 


16 


16 


2 


2 


2 


2 


2 


2 


8 


16 


16 


20 


4 


4 


4 


2 


2 


2 


9 


32 


36 


40 


4 


6 


6 


2 


2 


2 j 


10 


64 


72 


80 


8 


12 


12 


2 


2 


2 


11 


128 


129 


160 


16 


24 


24 


4 


4 


4 


12 


256 


196 


256 


16 


25 


32 


4 


4 


4 


13 


512 


356 


512 


32 


48 


64 


8 


8 


8 



Table 6.1 : Comparing the number of codewords found using the greedy algorithm and the 
majority voting algorithm when W contains all possible ^-tuples 



6.4.3. Deleting a Codeword 

We have seen that in order to increase the number of codewords possible in a given 
code, some shorter length codewords must be deleted from the VLEC code such that more 
longer length codewords may be found. The problem is which codeword or group of 

159 



Chapter 6 - Code Constructions 



codewords is best deleted so as to maximise the number of possible new (longer) 
codewords. 

In the case when the required code needs to be horizontally linear, then we cannot 
delete one codeword at a time, unless the codeword to be deleted happens to be in a sub- 
code consisting of just two codewords, otherwise we would destroy the linearity. In this 
case no optimisation was sought, instead the method adopted was to delete the lower half 
of the respective sub-code. Since the codewords are in lexicographic order, then this 
would result in another linear code with half the number of codewords as in the original 
one. 

For the case of non-linear VLEC codes, some sort of optimisation was sought in 
order to maximise the number of codewords possible in a code. Hence when deleting a 
codeword from a code, the one deleted is that which will result in the greatest number of 
new words satisfying the minimum diverging distance. This is found through an 
exhaustive search. This of course is far from optimum but was found to give the best 
compromise between the computation speed and the resultant code optimality. There are in 
fact two main problems with this approach. First, the possible number of words that may 
be found satisfying the minimum diverging distance to the remaining codewords is 
maximised for each codeword deleted, in turn. However, this number could be further 
maximised if the deleted codewords are considered simultaneously. The second problem is 
that by deleting that codeword which gives the maximum number of words which satisfy 
the minimum diverging distance, it does not necessarily maximise the number of words 
which will also satisfy the minimum converging and block distances, even though in this 
case there would be more words to choose from. 

6.5. Comparing Constructions 

In order to assess better the qualities of the various construction algorithms presented 
in the earlier sections, we compare in Table 6.2 the codes produced by each respective 
algorithm for the 26-symbol English source given in Table A.l, for various values of d^. 
Note from this table that the codes with the shortest average codeword length are those 



160 



Chapter 6 - Code Constructions 



produced with the heuristic construction making use of the majority voting algorithm to 
choose codewords of the same length satisfying the minimum block distance. This is 
somewhat as expected, since the heuristic algorithm in general matches the codeword 
lengths to the source statistics, and the majority voting algorithm uses the best algorithm to 
choose codewords of the same length with a specified minimum distance. 

Also, note from Table 6.2 that the horizontally linear codes constructed with the 
heuristic algorithm are almost as good as the non-linear codes. However, on closer 
examination we observe that the horizontally linear codes so produced have horizontal sub- 
codes containing only few codewords. This of course offsets somewhat any gain obtained 
in having horizontal linearity. If we were to restrict the number of horizontal codes 
produced, and hence increasing the number of codewords per sub-code, this will result in 
increased average codeword length, as shown in Table 6.3. It is interesting to note that one 
of the codes produced in this way has the same parameters as Cu generated with the code- 
anticode construction in Example 6.1. 

Another important aspect of the heuristic construction algorithm is the design value 
for dmm> It was stated that the best value for d m \ n is ["^free/2"|. In fact, we could also choose 
d m \n to be although in this case, for odd tffree, it will result in a slightly slower 

construction, since more words will be found satisfying d m \ n in this case, and hence more 
computations will be required. The reason for choosing this value of d mm is so that d mm 
and c m in are nearly equal. 3 If d m \ n is greater than [d^/2^ then less words will be found 
satisfying this value for d m i n . Even though the corresponding value of c m i n required will be 
smaller, the net result is that of increasing the average codeword length. Similarly, if d m \ n is 
chosen to be smaller than [d^/2], the required minimum converging distance will 
correspondingly increase and this will also reduce the number of words satisfying c m in. 
This is shown in Table 6.4, where different values for d mm were used to construct VLEC 
codes with d^ = 5 for the 26-symbol English source. 



Note that 4™ + = d^. 



161 



Chapter 6 - Code Constructions 



Method 


Average 
length 


Code 




a-prompt construction 
(Section 2.6.2) 


7.830 


(15@7,3; 11@14,3;3,0) 


Code-anticode construction 


8.096 


(16@8,3; 16@9,3;2,1) 


Heuristic construction j 
(Horizontally linear) 


6.741 


(l@4,s 4@5,3; 1@6,-; 2@7,5; 4@8,3; 
2@9,6; 2@10,6; 4@1 1,3; 2@12,3; 4@13,3; 
2,1) 


Heuristic construction 
(Restricted greedy algorithm) 


6.475 


(2@5,4; 4@6,4; 7@7,3; 6@8,3; 6@9,3; 
1@10,-; 2,1) 


Heuristic construction 
(Majority voting algorithm) 


6.370 


(1@4,-; 1@5,-; 5@6,3; 6@7,3; 3@8,3; 
4@9,3; 4@10,3; 1@12,-; 1@13,-; 2,1) 


dfree = 5 


a-prompt construction 
(Section 2.6.2) 


11.03 


(23@11,5; 3@19,5, 5,0) 


Code-anticode construction 


10.46 


(8@10,5; 8@11,5; 10@12,5; 3,2) 


Heuristic construction 
(Horizontally linear) 


8.729 


(1@6,-; 1@7,-; 4@8,5; 4@9,5; 4@10,5; 
2@1 1,5; 2@12,8; 2@13,10; 2@14,7; 
2@15,6;2@16,6;3,2) 


Heuristic construction 
(Restricted greedy algorithm) 


8.690 


(1@6,-; 2@7,5; 2@8,5; 4@9,5; 4@10,5; 
4@11,5; 4@12,5; 4@13,5; 1@14,-; 3,2) 


Heuristic construction 
(Majority voting algorithm) 


8.467 


(1@6,-; 1@7,-; 4@8,5; 5@9,5; 5@10,5; 
6@11,5;4@12,5; 3,2) 


dftee = 7 


Heuristic construction 
(Horizontally linear) 


10.85 


(1@7,-; 1@8,-; 2@9,7; 2@10,8; 2@11,8; 
2@12,10; 2@13,8; 1@14,-; 4@15,7; 
4@16,7; 2@17,8; 3@18,7; 4,3) 


Heuristic construction 
(Restricted greedy algorithm) 


11.03 


(1@8,-; 1@9,-; 2@10,7; 3@11,7; 4@12,7; 
5@13,7; 6@14,7;4@15,7; 4,3) 


Heuristic construction 
(Majority voting algorithm) 


10.70 


(1@7,-; 1@8,-; 1@9,-;2@10,7; 2@11,8; 
4@12,7; 4@13,7; 5@14,7; 6@15,7; 4,3) 



Table 6.2: Various codes for the 26-symbol English source constructed using different 

algorithms 



162 



Chapter 6 - Code Constructions 



Average 


Number of 


Code parameters 


Length 


sub-codes (o) 




8.729 


11 


(1@6,-; 1@7,-; 4@8,5; 4@9,5; 4@10,5; 2@1 1,5; 2@12,8; 






2@13,10; 2@14,7; 2@15,6, 2@16,6; 3,2) 


8.988 


8 


(2@7,5; 4@8,5; 4@9,5; 2@10,8; 2@1 1,6; 2@12,5; 4@13,6; 






6@14,5; 3,2) 


9.842 


4 


(4@9,5; 8@10,5; 8@1 1,5; 6@12,5; 3,2) 


10.46 


3 


(8@10,5; 8@11,5; 10@12,5; 3,2) 



Table 6.3: Horizontally linear VLEC codes for the 26-symbol English source with df xcc = 5 

with various numbers of sub-codes 



Design 


Average 




^min 


codeword 


Code parameters 




length 




5 


11.02 


(22@11,5;4@16,5;5,0) 


4 


10.06 


(2@9,6; 8@10,5; 11@11,5; 1@12,-; 2@13,5; 2@14,6; 4,1) 


3 


8.467 


(1@6,-; 1@7,-; 4@8,5; 5@9,5; 5@10,5; 6@11,5; 4@12,5; 3,2) 


2 


8.463 


(1@6,-; 2@7,5; 4@8,5; 3@9,5; 4@10,5; 4@11,5; 5@12,5; 






2@13,5; 1@14,-; 2,3) 


1 


10.02 


(2@9,6; 9@10,5; 10@11,5; 1@12,-; 2@13,6; 2@14,6; 1,4) 


0 


11.02 


(22@11,5; 4@16,5;1,5) 



Table 6.4: Variation of average codeword length with d m \ n for the heuristic construction 



Table 6.4 also serves to show the non-optimality of this construction algorithm. 
Consider a VLEC code C {s\@L\ 9 b\\ S2@L 2 , 62; sj@JLo> b a \ d mm , Cmin) and let c = 
c\C2- • -ci, c e C. Define the swap operation on codeword c (~) to be 

c = C/C/.j'-'C,. (6.1) 
Then the swap code of C, C, is the code { c : c e C}. It is obvious that code C has the 
same performance as C with maximum likelihood decoding. In particular, the free distance 
of both codes is the same. In fact the only difference between the two codes is the values 
of the minimum diverging and converging distances, which are simply swapped, i.e. C is a 
(s\@Zu b\; S2@Li 9 62; s<j@Lo> bj, Cmi n , <£mn) VLEC code. Hence optimum VLEC 
codes having swapped minimum diverging and converging distances must have the same 
average codeword length. From Table 6.4 it is clear that in this case the construction 



163 



Chapter 6 - Code Constructions 



algorithm does not find codes with the same average codeword length when d m \ n and c mm 
are swapped and hence is definitely (in general) sub-optimum. 

6.5.1. Two-Length Error-Correcting Codes and VLEC Codes 

The a-prompt construction with df^= 3 given in Table 6.2 is in fact a two-length 
error-correcting code as considered by Dunscombe [1988]. This is constructed from the 
(7,4) Hamming base code. It is interesting to note that because this base code is perfect, 
the performance of this two-length code is practically the same both with the prefix 
decoding algorithm and with maximum likelihood decoding. However, due to the 
restrictions on the possible codeword lengths, the code rate for this code is low when 
compared to the best VLEC code constructed with the heuristic algorithm. In fact, for; a 
slightly lower code rate, we can even construct a free distance five VLEC code. The 
performance of these codes is shown in Figure 6.7, together with their respective code 
rates. Their relatively poor performance is the main disadvantage with two-length error- 
correcting codes over the AWGN channel. In addition, as discussed in Chapter 5, these 
codes also suffer from the same disadvantages as fixed-length codes over channels which 
admit symbol deletion and/or insertion errors. 



le+00 




E b /No (dB) 



Figure 6.7: Comparing the performance of a two-length error^correcting code with VLEC codes 

164 



Chapter 6 - Code Constructions 



6.6. Comparing Performance of VLEC Codes with Standard 
Coding Techniques 

Having discussed how to construct good VLEC codes, we now compare the 
performance of VLEC codes (with maximum likelihood decoding) to that of standard 
block and convolutional error-correcting codes. Also, since VLEC codes perform 
combined source and channel coding, then we will also compare their performance with 
standard cascaded techniques using separate source and error-correction coding. 

Figure 6.8 shows the performance of the VLEC code C 2 o (1@6,-; 1@7,-; 4@8,5; 
5@9,5; 5@10,5; 6@11,5; 4@12,5; 3,2) given in Table A.3, with free distance five and an 
average codeword length of 8.47 bits when used to encode the 26-symbol English source. 
This is compared both with BCH error-correcting codes and a convolutional code of the 
same minimum or free distance as appropriate. These codes were chosen so as to have 
approximately equal code rates. Hence, where appropriate, a source code (a Huffman 
code) was used to pre-encode the source before encoding with the error-correcting code. 
The Huffman code was chosen since it is also a variable-length code. The Huffman code 
for the 26-symbol English source has an average codeword length of 4.21 bits and hence a 



1e-H)0 



■§ le-02 

O 



§ 

w 

£ le-04 



le-06 




-Uncoded, R=l 



- Huffman code + (15,7) t=2 BCH code, 
R=0.590 

- (31,21) t=2 BCH code, R=0.6774 

- Huffman code + Rate 14 convolutional 
code K=2, R=0.595 

"VLEC code C 2 n, R^O^l 

-(Sequential decoding) 



8 



Et/No (dB) 

Figure 6.8: Free/Minimum distance 5 codes used to encode the 26-symbol English source 

165 



Chapter 6 - Code Constructions 



code rate of 1.19. In the results presented here both the BCH codes and the convolutional 
codes are decoded using hard decision, maximum likelihood decoding. For the BCH 
codes, (n, k) denotes a code with block length n and k information bits while / gives the 
number of correctable bit errors in a codeword, although, since maximum likelihood 
decoding is used, more than this number of errors may be corrected for some codewords. 
For the convolutional codes, K gives the constraint length 4 . The generator sequences for 
the convolutional codes used here, which give the maximal value for for the given rate 
and constraint length, are taken from Table 11.1 of Lin and Costello [1983] . Note that the 
average codeword length of C20 is much shorter than the codeword length of either of the 
two BCH codes considered in Figure 6.8. For the case where no Huffman code is used, the 
source is simply encoded using 5 information bits for each symbol. Note that the VLEC 
code achieves a coding gain of slightly more than 0.5dB at a SEP of 10" 4 over the best of 
the standard schemes, that using a cascaded Huffman/convolutional system, although the 
(31,21) BCH code starts to get the upper hand at high SNR due to its higher code rate. 

Unfortunately, this modest coding gain is achieved at the expense of a large increase 
in the decoder complexity. Since with maximum likelihood decoding both VLEC and 
convolutional codes use practically the same algorithm (the Viterbi decoding algorithm), 
then it is easier to compare the complexity between these two schemes than between VLEC 
codes and block codes. Using similar reasoning as in Section 3.7, the number of 
comparisons and additions per source symbol required in the case of a rate k / n 
convolutional code with constraint length K are respectively given by 

v(2*-l)2* 

Number of comparisons per source symbol = — — (6.2) 

k 

vn2 k * K 

Number of additions per source symbol 5 = (6.3) 

k 

where vis the (average) number of information bits required to encode a source symbol. 



4 Where 1 K gives the number of distinct states in the convolutional encoder. 

5 Here it is assumed that the branch metric is computed for each output bit. Since in convolutional codes n is 
usually small, this number may be reduced by a factor n by implementing a lookup table to determine the 
branch metric for the n bits together. 

166 



Chapter 6 - Code Constructions 



Using equations (3.47) and (3.48) for VLEC codes and equations (6.2) and (6.3) for 
convolutional codes we require, on average, 212 comparisons and 2151 additions per 
source symbol in the case of the VLEC code while we only require, on average, 16.8 
comparisons and 67.2 additions per source symbol in the case of the convolutional code, 
where in this case, because of the use of the Huffman code, v = 4.20 bits per symbol. The 
complexity in decoding the Huffman code is ignored in this case, since this is negligible 
compared to the complexity of the Viterbi decoder. 

Figure 6.9 shows similar comparisons between the VLEC code C 2 \ (1@7,-; l@8,s 
1@9,-; 2@10,7; 2@11,8; 4@12,7; 4@13,7; 5@14,7; 6@15,7; 4,3) given in Table A.3, 
with average codeword length of 10.7 bits when used to encode the 26-symbol English 
source, and standard error-correcting codes with free/minimum distance seven. The (15,5) 
BCH code is preceded with the Huffman code for the English source. Again, the VLEC 
code performs slightly better than the half rate convolutional code. In order to match the 
code rate as much as possible, the convolutional code in this case is used with the uncoded 
source, i.e. v = 5 bits per symbol. The average number of comparisons and additions per 
source symbol for the VLEC code C21 are respectively 268 and 3488, whereas the 
corresponding numbers for the convolutional code are 80 and 320. Hence the decoding 
complexity for the VLEC code is an order of magnitude larger than that for the 
convolutional code with the same free distance. Therefore a larger free distance 
convolutional code may be used for the same complexity and in this case the performance 
of the convolutional code will be better than that of the VLEC code. 

Similarly, Figures 6.10 and 6.11 give comparisons for codes C22 (2@7,5; 3@8,5; 
4@9,5; 5@10,5; 8@11,5; 8@12,5; 8@13,5; 9@14,5; 9@15,5; 10@16,5; 13@17,5; 
11@18,5; 16@19,5; 17@20,5; 5@21,5; 3,2) with average codeword length of 9.60 bits and 
C23 (32@15,7; 32@16,7; 64@17,7; 5,2) with average codeword length of 15.1 bits, both 
given in Table A.4, when used to encode the 128-symbol ASCII source also given in this 
table. The source statistics for this source were derived from the source file of a C 
program. The Huffman code for this source has an average codeword length of 5.10 bits 
and a code rate of 1.38. Note that in Figure 6.10, the cascaded Huffman code with the 

167 



Chapter 6 - Code Constructions 



J? 
1 

O 

£ le-03 




le-01 



le-05 




— Ar- (Sequential decoding) 



le-07 



0 



2 



4 



6 



8 



£ 6 /#o (dB) 



Figure 6.9: Free/Minimum distance 7 codes used to encode the 26-symbol English source 

(31,21) BCH code performs better than the VLEC code at high SNR. This is mainly due to 
the fact that this cascaded scheme has a higher code rate. 

Referring to Figure 6.10, the coding gain of C22 compared to the convolutional code 
at a SEP of 10" 5 is about ldB. If we were to compare the number of comparisons and 
additions necessary in this case, we find that the convolutional code requires on average 
20.4 comparisons and 81.6 additions per symbol, where due to the Huffman code, v=5.10 
bits per symbol. Whereas to decode C22 we require 1219 comparisons and 19210 additions 
per symbol. Notice the large difference in the number of additions required. This is due to 
the fact that the branch labels in the trellis for the convolutional code contain only n bits (in 
this case 2), whereas the branch labels for the VLEC code contain the codewords 
themselves, in this case with a maximum codeword length of 1 8 bits. Also, the number of 
transitions going to a state is 2 k in the case of convolutional codes, while the equivalent 
number for VLEC codes is s. * 

For the codes in Figure 6.11, the convolutional code requires 112 comparisons and 
448 additions per symbol, whereas the VLEC code C23 requires 1919 comparisons and 
31429 additions per symboL It is interesting to note that by increasing the free distance 



168 



Chapter 6 - Code Constructions 



o 



I 



le-01 



le-03 



|^ le-05 



le-07 




-Uncoded,R=1 

- (31,21) t=2 BCH code, R=0.677 

- Huffman code + (31,21) t=2 BCH 

code, R=0.931 

- Huffman code + Rate Vi convolutional 

code K=2, R=0.687 
" VLEC code C^, R=0.729 

- (Sequential decoding) 



EtlNo (dB) 

Figure 6.10: Free/Minimum distance 5 codes used to encode the 128-symbol ASCII 

source 



from 5 to 7, the number of operations in the case of the convolutional code increases by a 
factor of five. This is somewhat inflated by the fact that for the free distance 7 code, the 
source is not encoded with the Huffman code, resulting in v=7. If we allow for this fact, 
then the increase is a factor of 3.5. Whereas in the case of the VLEC codes the 
corresponding increase is a factor of 1 .5. 

The code rate for code C23 having d^* = 7 is not very good, because this code was 
constructed using the code-anticode construction and is hence a horizontally linear VLEC 
code. This means, however, that for this code we can greatly reduce the decoding 
complexity by decoding each sub-code independently using their linearity. If we ignore the 
complexity required for this step, then in this case, since there are three fixed-length sub- 
codes in C23, the number of comparisons and additions per symbol are respectively 30 and 
725 which are comparable to those required by the convolutional code. The number of 
additions may be further reduced to 1 5 if the decoder for the sub-codes supplies also the 
metric value for the decoded codewords. 



169 



Chapter 6 - Code Constructions 



le+00 




Eb/No (dB) 



Figure 6.11: Free/Minimum distance 7 codes used to encode the 128-symbol ASCII source 

The three sub-codes for C23 have parameters (15,5), (16,5) and (17,6). So comparing 
the decoding of this VLEC code to the BCH codes, we now require three decoders instead 
of just one. However these could be implemented in parallel to speed up the decoding. In 
the case of fixed-length block codes the decoding window moves n bits at a time, i.e. once 
a codeword is decoded, n bits from the received bit sequence are discarded and the next n 
bits are considered. In the case of maximum likelihood decoding of VLEC codes, this is 
not so. When the codeword lengths have no common factor, as in this case, the decoding 
window is moved only one bit at a time. Therefore the three sub-code decoders are used to 
decode codewords of lengths 15, 16 and 17, but then only one bit from the input bit 
sequence is discarded and the process repeated for the next decoding. Hence, since the 
average codeword length for C23 is 15.1 bits, on average we need to decode 3x15.1 
codewords for each symbol transmitted, whereas for the BCH codes we only need to 
decode v/k codewords. 

Also shown in Figures 6.8-6.11 is the performance of each corresponding VLEC 
code with the stack decoding algorithm given in Chapter 4, with a stack size of 50. 
Although all these codes, with the exception of C23, are sequentially catastrophic, we note 

170 



Chapter 6 - Code Constructions 



that the performance is almost as good as that with maximum likelihood decoding. 
However, comparing the required number of comparisons and additions necessary for these 
two algorithms using equations (3.47), (3.48), (4.15) and (4.17), we deduce that, for high 
SNR, the number of comparisons in the sequential algorithm is reduced by a factor of 
L 7£ ( <o 1) 311(1 ±e number of additions by a factor of ^f^, since for high SNR £ -» 1. In 
Table 6.5 we summarise the number of computations required for the codes considered 
here. Note that "MLD" and "Sequential" in this table give the number of computations 
required, for each corresponding VLEC code, with maximum likelihood and sequential 
decoding respectively, while Z, av erage and a give respectively their average codeword length 
and the number of different codeword lengths. We note that the number of computations 
required for sequential decoding the VLEC codes used to encode the 26-symbol source is 
now comparable to that required to decode the convolutional codes. However in the case 
of the VLEC codes used to encode the 128-symbol source, the number of additions 
required with sequential decoding is still an order of magnitude more than that required for 
the convolutional codes. Again, for code C23 this number can be further reduced, as 
discussed above, by taking advantage of the fact that this code is horizontally linear. 





26-symbol 


128-symbol 




<^free 


= 5 




= 7 


4ree 


= 5 


<4ee 


= 7 




Comparisons 


Additions 


Comparisons 


Additions 


Comparisons 


Additions 


Comparisons 


Additions 


Convolutional 


16.8 


67.2 


80.0 


320 


20.4 


81.6 


112 


448 


MLD 


212 


2151 


268 


3488 


1219 


19210 


1919 


31429 


Sequential 


19 


254 


17 


326 


113 


2001 


125 


2081 


/■average 


8.47 


10.7 


9.60 


15.1 


a 


7 


9 


15 


3 



Table 6.5: Comparing number of computations required for convolutional and VLEC codes 



171 



Chapter 6 - Code Constructions 



6.7. Conclusion 

The two constructions for VLEC codes given in this chapter both involve, to some 
degree, a computer search. In the case of the code-anticode construction, a computer 
search is necessary in order to find a suitable modification vector (or a column 
permutation) so as to get the required minimum converging distance. The problem in 
designing a VLEC code with a specified minimum converging distance in the first place is 
difficult because in computing . the converging distance not all codewords are shifted the 
same number of bits and the resultant structure will become non-linear even if the code is 
horizontally linear. Another disadvantage of this construction is that the resultant 
codeword lengths are not matched to the source statistics and consequently the resultant 
code will have non-optimal average codeword length for the given free distance. 

The second algorithm given to construct VLEC codes relies completely on a 
computer search and uses some heuristics to limit the search space. This construction may 
be adapted to give both horizontally linear and non-linear VLEC codes. The codes so 
constructed have codeword lengths matched to the source statistics and hence have a 
shorter average codeword length than codes constructed with the code-anticode 
construction. The disadvantage of this algorithm is that it becomes impractical for sources 
with a large number of symbols. The maximum number of symbols deemed practical with 
current computer technology is about 128 symbols. The main problem is the maximum 
codeword length; as this increases, the search space increases exponentially. 

No attempts were made to construct the horizontal sub-codes of horizontally linear 
VLEC codes as cosets of cyclic error-correcting codes. However we envisage no problems 
with this approach. This will further facilitate the encoding and decoding of VLEC codes. 

The number of codewords in a horizontal sub-code of a horizontally linear VLEC 
code must be a power of two. However, for the last sub-code, i.e. the one with maximum 
codeword length, this condition may be relaxed by having a number of unused codewords. 
This is necessary so as to obtain the required overall number of codewords for the VLEC 
code. This does not present any problems in decoding such a code. The only alteration 
necessary is to assign a large cost to these unused codewords, so that if one of them is 

172 



Chapter 6 - Code Constructions 



chosen by the decoder of the corresponding sub-code (this is possible in the presence of 
noise, of course), then the modified Viterbi decoder would in any case disregard the 
corresponding path in the trellis. 

A simulated annealing approach [El Gamal et aL, 1987] to the design of good VLEC 
codes was also attempted, without much success. Although the algorithm works well for 
the design of fixed-length codes 6 , it breaks down when it comes to VLEC codes. Here, the 
energy function must incorporate not only the minimum distance but a host of other 
parameters, viz. minimum block, diverging and converging distances and the average 
codeword length. Also the fact that the codeword lengths are not equal makes it difficult to 
define a suitable energy function. For instance two short codewords may be only a 
specified distance from each other, but two long ones may be at a much larger distance 
more easily. Hence the algorithm tends to produce VLEC codes where the short 
codewords do not conform with the required parameters. 

It is interesting to note that the construction of equivalent VLEC codes is not trivial, 
where equivalent VLEC codes are codes having an identical distance spectrum. In fact the 
only equivalent code construction known to us which may be applied in general is that 
obtained through swapping the codewords (see Section 6.5). 

In the previous chapter we have seen that it is advantageous to construct non- 
catastrophic VLEC codes. Unfortunately both the constructions given in this chapter are 
not easily adaptable to prevent constructing non-catastrophic codes and sometimes hand 
tweaking would be necessary in order to avoid a catastrophic code. However, we do not 
consider catastrophic behaviour of VLEC codes of prime importance with regard to their 
performance. 

Finally we have seen that although VLEC codes do achieve a slight coding gain over 
standard coding schemes of comparable coding rate and minimum/free distance, this is 
usually achieved at the expense of a large increase in the complexity of the decoder. For 



6 For instance, the maximum number of codewords found for a fixed-length code with block length 8 and 
minimum distance 3 using simulated annealing was 20, which is indeed the maximum number possible 
[Mac Williams & Sloane, 1978]. 

173 



Chapter 6 - Code Constructions 



high SNR the decoding complexity may be drastically reduced in the case of VLEC codes 
by using sequential decoding. But obviously this is also a solution for convolutional codes, 
although for VLEC codes the size of the required stack is relatively much smaller. The 
number of comparisons and additions are not always useful in comparing the complexity 
between two decoding algorithms, although it is indicative. These numbers depend a lot 
on the actual implementation of the decoder. Equations (3.47) and (3.48) assume that the 
decoding is being achieved in software in a sequential manner. However, if the metric 
computation is done in hardware, for instance, then the number of required additions may 
be reduced a lot. For instance, if the maximum codeword length is L m then two L^bit 
registers together with an Z^bit binary adder may be used to evaluate the metric of any 
codeword in two clock cycles. In addition a number of these computations may be 
performed in parallel. Hence it will not be impossible to implement a VLEC maximum 
likelihood decoder in hardware operating at reasonable speeds. 



174 



Chapter 7. 

Conclusion 



In this thesis we have introduced a novel representation for VLEC codes which 
incorporates the spatial memory inherent in these codes due to their variable-length nature. 
This led to the derivation (in Chapter 3) of a maximum likelihood decoding algorithm 
based on the Viterbi algorithm. The performance of VLEC codes with this algorithm is 
much better than with the instantaneous algorithms found in the literature (see Chapters 2 
and 3) and is also in fact slightly better than the performance of standard cascaded source 
and channel coding schemes employing Huffman codes (for source coding) and BCH or 
convolutional codes (for channel coding) with similar code rates and the same minimum or 
free distance (see Chapter 6). However, the main objective for combining source and 
channel coding is to obtain a reduction in the overall system complexity for the same 
performance. Unfortunately, we have shown in Chapter 6 that the complexity of maximum 
likelihood decoding for VLEC codes is much more than that for the separate schemes. The 
decoding complexity for VLEC codes at relatively high SNR may be reduced by using 
sequential decoding (Chapter 4). For VLEC codes with adequate CDF growth rate, the 
performance with sequential decoding is the same as with maximum likelihood decoding, 
but the decoding complexity is decreased by a factor of Z aV crage/g (for high SNR) and 
becomes, in some cases, comparable to that of the standard cascaded schemes. 

The decoding complexity may be further reduced by using horizontally linear VLEC 
codes. The reduction factor is influenced by the number of horizontal sub-codes used; the 
less the number of sub-codes, the greater the reduction in the decoding complexity. 
However this must be balanced out against a possible decrease in the code rate due to non- 



175 



Chapter 7 - Conclusion 



matching of codeword lengths to the source statistics, resulting in a longer average 
codeword length. 

By using the modified Viterbi algorithm, soft-decision decoding for VLEC codes is 
very easy to implement and the decoding complexity is not increased much. The decoding 
complexity may be further reduced by performing trellis decoding for each of the 
horizontal fixed-length sub-codes 1 [Wolf, 1978] [McEliece, 1994]. It may even be possible 
to reduce the overall complexity by combining these two trellis structures, hence removing 
any redundancy in the number of states and/or transitions. 

The main problem in using variable-length codes in general over the BSC is that of 
synchronisation, whereby simple substitution errors may cause a long error propagation 
because of loss of synchronisation. With VLEC codes the problem still exists; however, 
now some error patterns may be corrected and hence loss of synchronisation is avoided in 
these instances. By using maximum likelihood decoding this problem is reduced even 
further (at the expense of increased complexity) by checking every synchronisation position 
possible and choosing that message which gives the minimum number of bit errors 2 . This 
is done in the most efficient way possible by using either the Viterbi algorithm or, for high 
SNR, the stack algorithm. The instantaneous decoding strategies for VLEC codes treated 
in the literature (see Chapter 2) still fail miserably when they lose synchronisation. This 
problem with the instantaneous algorithms may be reduced by using two-length VLEC 
codes. However this is a half-way approach to combined source and channel coding in that 
it is not very effective in performing good source coding, since the codeword lengths 
cannot be matched well with the source statistics. We have also shown in Chapter 5 that 
these codes have practically the same disadvantages as fixed-length codes over channels 
which admit symbol deletions and insertions, whereas VLEC codes with codeword lengths 
having no common factor do not. 



1 This may be possible both for horizontally linear and non-linear sub-codes, although in the latter case it may 
be more difficult to implement. 

2 On the BSC this will also be the most likely one.. 

176 



Chapter 7 - Conclusion 



From the bounds obtained in Section 3.5.1 we have seen that the single most 
important parameter that affects the performance of VLEC codes over the BSC with 
maximum likelihood decoding is the free distance 3 . For most VLEC codes considered, the 
bound on given by expression (3.18) was found to be met with equality. This is 
mainly due to the fact that in VLEC codes there always exist very short merged paths. If 
the code contains codewords of equal length, then the shortest merging occurs with paths 
with just one source symbol. In any case, there will always exist merged paths with just 
two source symbols. Hence, the three parameters b m [ n9 d m \ n , and c m j n are very indicative of 
the performance of a VLEC code, since in most cases the free distance will be directly 
determined by these parameters. 

One direct consequence of the above is the interesting situation whereby the 
minimum block distance of the shortest length codewords can be the same as that for the 
longest length codewords, without affecting the performance of the code. From this fact 
we can deduce that there is a larger probability of correcting errors in the shorter 
codewords than in the longer ones. But this makes perfect sense also in light of the fact 
that the shorter length codewords are more probable (since they are mapped to the most 
probable source symbols). Hence, on average, the error correcting capabilities of a VLEC 
code is improved in this way. Therefore we can also claim that VLEC codes may offer 
unequal error protection to different source symbols, offering more protection to the more 
probable ones. 

Using some of the newly defined properties of VLEC codes, we may characterise 
classical properties of variable-length codes for the noiseless case using new definitions. 
Hence, a code is non-singular iff b m \ n > 0. A code is uniquely decodable iff flfeee > 0, since 
any two merging paths in the trellis would be distinguishable. A code has a finite 
decodable delay iff d^ > 0 and d c (7j) > 0 for some finite 77, where d c (7j) is the CDF for the 
VLEC code. A code is instantaneously decodable (i.e. it is a prefix code) iff d^ > 0 and 
dmin > 0. Finally, a code has a finite synchronisable delay iff the code is non-sequentially- 

3 This was also found to be the case, using simulation, for sequential decoding of non-sequentially 
catastrophic VLEC codes. 



177 



Chapter 7 - Conclusion 



catastrophic and has c m j n >0 (see Theorem 5.3) 4 . It is also conjectured that a code is 
exhaustive iff = 1 and A\ = ^average, where A\ is the average number of converging pairs 
of paths at Hamming distance one and ^average is the average codeword length. 

7.1 . Scope for Further Research 

Many open problems remain. Here we list some which we consider important. 

Upper and lower bounds on the possible average codeword length for VLEC codes 
with a given free distance are required. These are not easy to derive due to the multitude of 
parameters involved. Because VLEC codes are used for combined source and channel 
coding, an optimum 5 VLEC code for one particular source may be sub-optimal for a 
different source. In addition, whereas for fixed-length codes the only parameter that needs 
to be bounded, for a given number of codewords and minimum distance, is the block 
length, in the case of VLEC codes the number of parameters is much more. These are the 
minimum and maximum codeword lengths, the number of different codeword lengths and 
the minimum block, diverging and converging distances required for the given free 
distance. These bounds, however, will enable us to judge how good a particular VLEC 
code is and may prompt further research to find a better construction for these codes. 
Furthermore, the heuristic construction algorithm, given in Section 6.4, which is not 
practical when constructing codes with a large number of codewords, has shown that the 
code-anticode construction is sub-optimal. Hence other construction techniques are 
required for large codes. 

Due to the good synchronisation properties of VLEC codes found in Chapter 5 under 
symbol insertion or deletion errors, it is worthwhile to investigate the performance of these 
codes over channel models which admit these types of errors (such as for instance over 
fading channels) and compare their performance with standard coding schemes. 



4 The proof is slightly different, in that the decoder is not attempting to correct errors and hence does not need 
the GCD of the codeword lengths to be one, since the decoder does not need to assume that the first received 
bit correspond to a state in the trellis. 

5 Optimum in the sense that the average codeword length is the minimum possible for the given free distance. 



178 



Chapter 7 - Conclusion 



It was found that the performance of sequential decoding depends on the growth rate 
of the CDF, using a rather simplistic model (see Section 4.4.2). This, however, has not 
been supported much by the simulation results obtained. A more accurate model must 
therefore be used. We also think that a different measurement replacing the CDF must be 
introduced, one which will also include the path probabilities in addition to their distance. 
For instance, a VLEC code may be sequentially catastrophic only for a single pair of paths, 
whose probability of occurrence is negligibly small. Obviously this will perform much 
better than another sequentially catastrophic code with many possible paths exhibiting 
catastrophicity, even though in this case the CDF for both codes may be identical. Bounds 
on the computational effort for sequential decoding will also be useful. 

The relationship between the decoding window depth in the modified Viterbi decoder 
for a VLEC code and the constraint length of the code must be established theoretically. 
This relationship was briefly treated empirically in Section 3.6. This will enable the design 
of maximum likelihood decoders for VLEC codes with the minimum possible buffer size. 

Of mathematical interest is the relationship between the forbidden states at the initial 
and final part of the trellis for a VLEC code and its codeword lengths. No simple 
relationship could be found. This could be of interest when implementing a sequential 
decoder, since these forbidden states must be eliminated at the end of the tree so that the 
correct number of bits will be decoded in the final path. 

7.2. Some New Ideas 

One of the problems with VLEC codes is that the free distance cannot be increased 
without increasing the average codeword length (for an optimum code), hence decreasing 
the code rate. This has a two-fold disadvantage for a constant information rate system. 
First, a larger bandwidth will be required, and secondly, due to the increased bit rate on the 
channel, the energy per bit will decrease (for a constant transmitter output power) resulting 
in more errors (i.e. the cross-over probability of the derived BSC will be higher). Both of 
these effects will increase the SEP and somewhat offset the coding gain achieved by 



179 



Chapter 7 - Conclusion 



increasing the free distance. These problems may be solved using what we term as state- 
splitting VLEC codes. These are considered in Section 7.2.1. 

A finite state machine representation for VLEC codes is also possible, and this may 
enable other mathematical techniques to be used to characterise the properties of VLEC 
codes with maximum likelihood decoding. This is treated in Section 7.2.2. 

7.2.1. State-Splitting Variable-Length Error Correcting Codes 

In order to increase the free distance of VLEC codes without increasing the average 
codeword length, we can employ a technique which we call state-splitting; hence the 
resultant codes will be known as state-splitting VLEC (SSVLEC) codes. The idea is similar 
to that in convolutional codes, where the longer the constraint length, the larger the free 
distance possible. A longer constraint length directly translates into more states in the 
trellis for the convolutional code and this permits longer unmerged paths and hence larger 
free distance. The technique we propose here is similar. The states in a VLEC code are 
related solely to the corresponding starting bit position of each codeword. In SSVLEC 
codes, these states are artificially split into other states. We define the order of a SSVLEC 
code to be the number of states representing a single position. Hence the VLEC codes 
considered in this thesis are SSVLEC codes of order one. 

Consider code C$ given in Table 3.3. This has constraint length 1.58, since we only 
require to store in memory three states to build all subsequent states in the trellis. 
However, if we were to split each one of these states in two, this will give a SSVLEC code 
of order two with constraint length 2.58. Hence, while the average codeword length 
remains the same, the constraint length increases. Now, however, each source symbol is no 
longer mapped to a single codeword, but to multiple codewords, dependent on the current 
state of the encoder. A possible code for this two symbol source is given in Figure 7.L 
With this encoding, the free distance for the SSVLEC code is increased from four (for the 
original code C$) to five, for the same code rate. 



180 



Chapter 7 - Conclusion 




Figure 7.1 : SSVLEC code of order two 



Obviously now the complexity of the encoder and the decoder increases 
correspondingly. In addition, we must now devise other algorithms to construct good 
SSVLEC codes. 

7-2.2. Finite State Variable-Length Error-Correcting Codes 

The trellis constructed for a VLEC code C (s\@L\, b\\ S2@L2, bi; sJ^JL^ 
dminj Cmin) in Chapter 3 has the disadvantage that the branch labels have a variable number 
of output bits, resulting in transitions to within the same stage of the trellis. Because of 
this, no finite state machine may be derived. We can eliminate this disadvantage by 
keeping the number of output bits per branch label fixed. We can choose this number to be 
any value, but the two most convenient ones are those corresponding to the minimum 
codeword length (L\) and the maximum codeword length (La), Obviously, to retain the 
same code mappings, we now need to vary the number of source symbols associated with 
each branch label. 

Consider first when the output branch labels are taken to be of length L\ bits. In this 
case, codewords of length greater than L\ will have extra bits left over. These extra bits, 
forming a word v, will determine the next state to which the encoder will jump. Each state 
will thus be associated with a unique word v which all branches emitting out of this state 
must output next. To be more precise, suppose that we are in some state v, with |v| = /, 
/ < L\j then the input source symbol a, mapped to the codeword c, = c, 1 c, 2 ---c l/ will result in 



181 



Chapter 7 - Conclusion 



a transition to state Ci^fi^^-c^ with output vc, l c,- 2 -"C/ />-r 6 The initial state is state A, 
i.e. the state associated with the empty word (since initially there are no previous extra 
bits). Any state v, with v = v\V2- • • v/ and 1>L\ will have a "spontaneous" (i.e. with no input 
source symbols) transition to state v^+iv^+s—v/ with output v\vr~v Lr All other states 
have a single transition to a next state (not necessarily different from the present one) for 
each one of the input source symbols. As an example, consider the state diagram for the 
simple VLEC code Ct, given in Table 3.3, shown in Figure 7.2, where <j> indicates a 
"spontaneous" transition with no input symbols. It is very easy to see that because of the 
way the finite state machine is constructed, the output is the same as if the source has been 
encoded with code C using the mapping a t — » c,. The only difficulty that may arise with 
this representation is at the end of the message. Here we must enforce that the encoder 
outputs also the word associated with the final state. 




Figure 7.2: Finite state diagram for VLEC code C& with output branch labels of length L\ 

The construction for the finite state machine for C taking L a as the number of output 
bits per branch label is very similar in principle, but results in an entirely different machine, 



i.e. the juxtaposition of the words v and c,,c* 2 - * m C iL ^ 

182 



Chapter 7 - Conclusion 



although the output for a given input message is the same 7 . One advantage with taking L a 
as the number of output bits is that we eliminate the "spontaneous" transitions. However 
now we may require more than a single input source symbol to effect a transition. The 
state diagram is constructed as follows. Again, the initial state is state X associated with 
the empty word. If the encoder is in state v, with |v| = /, then the input source symbol 
sequence a/ = <W"tf/ m which is mapped to the codeword sequence c M c,yCj m causes the 
encoder to emit the output bits vc^c^—c^p and go to state s, where ps = c, m such that 
Ivc/^yC/^pl = for some m. Again, we must enforce that the encoder outputs the 
word associated with the final state at the end of the message. Figure 7.3 gives the 
corresponding finite state diagram for code 




The implementation of maximum likelihood decoding for both these representations 
is straight forward using the Viterbi algorithm. In addition, these new representations may 
enable the use of other mathematical techniques with which to analyse VLEC codes. One 
interesting aspect of these representations is the number of states required in the finite state 
machine. For the particular case of code both the finite state machine representations 



7 It is the same if taken over the whole message; it will be different, however, if compared on a per source 
symbol basis. 



183 



Chapter 7 - Conclusion 



(Figures 7.2 and 7.3) result in four states. However for other codes this number may be 
different. In addition, the number of states is also different from that required by the trellis 
representation presented in Chapter 3. For instance for code Ce we only require three states 
in the trellis representation instead of the four required with the finite state machine 
representation. 

One other interesting possibility arising from this representation that could be 
investigated is that of constructing a new class of codes, which we call finite state VLEC 
(FSVLEC) codes, where the "codeword lengths" are matched to the source statistics 
automatically. First, a finite state machine representation for a Huflman code for the given 
source is derived. The output branch labels axe then replaced by new ones, of longer 
length, such that some required free distance is achieved. The requirement that the prefix 
of all output branch labels from state v is v may be dropped in this case to facilitate the 
design. The resultant code will then no longer be equivalent to a VLEC code. Some codes 
for an eight symbol source were designed using similar ideas to those used by Pollara et aL 
[1988] to construct finite state (fixed-length) codes. The resultant codes have good code 
rates and for a free distance three code, better performance than a VLEC code constructed 
with the heuristic algorithm with the same free distance. 

FSVLEC codes are also considered by Piret [1982], who calls them comma-free 
error-correcting codes of variable length. The author only considers codes for the BMS 
and the finite state machine is constructed such that the output branch labels are only of 
length one bit. The codes so constructed are consequently of low rate resulting in poor 
performance. 



184 



References 



Abramson N., Information theory and coding, New York, McGraw-Hill, 1 963. 

Bedi J.S., Dawood M.Z. & Iqbal R., "Simulation of comma free synchronization scheme 
and expected error span in variable length codes", Proc - 34th Midwest Symp. on Circuits 
& Systems, Vol. 1, pp. 253-256, 1992. 

Belongie M. & Heegard C, "Variable length trellis decoding", IEEE ISIT '93, San Antonio, 
Texas, USA, p. 261, 17-22 Jan. 1993. 

Bernard M.A. & Sharma B.D., "Some combinatorial results on variable-length error- 
correcting codes", ARS Combinatorial Vol. 25B, pp. 181-194, 1988. 

Bernard M.A. & Sharma B.D., "A lower bound on average codeword length of variable 
length error-correcting codes", IEEE Trans. Inform. Theory, Vol. 36, No. 6, pp. 1474-1475, 
Nov. 1990. 

Bernard M.A. & Sharma B.D., "Variable length perfect codes", J. Inform. & Optimization 
Sciences, Vol. 13, No. 1, pp. 143-151, 1992. 

Buttigieg V, "Combined source and error-control coding", M.Sc. Dissertation, University 
of Manchester, 1992. 

Buttigieg V. & Farrell P.G., "A maximum likelihood decoding algorithm for variable- 
length error-correcting codes", Proc. 5th Bangor Symposium on Communications, Bangor, 
Wales, pp. 56-59, 2-3 Jun. 1993a. 

Buttigieg V, "Two algorithms to calculate the distance spectrum of VLEC codes", Internal 
Report, Comms. Res. Group, University of Manchester, pp. 1-15, Apr. 1994a. 

Buttigieg V. & Farrell P.G., "On variable-length error-correcting codes", Proc. 1994 IEEE 
ISIT, Trondheim, Norway, p. 507, 27 Jun. - 1 Jul. 1994b. 

Buttigieg V. & Farrell P.G., "Sequential decoding of variable-length error-correcting 
codes", Proc. Eurocode 94, Cote d'Or, France, pp. 93-98, 24-28 Oct. 1994c. 

Buttigieg V. & Farrell P.G., "A maximum a-posteriori (MAP) decoding algorithm for 
variable-length error-correcting codes", Codes and cyphers: Cryptography and coding IV, 
Essex, England, The Institute of Mathematics and its Applications, pp. 103-119, 1995. 

Calabi L. & Hartnett W.E., '"A family of codes for the correction of substitution and 
synchronization errors", IEEE Trans. Inform. Theory, Vol. IT-15, No. 1, pp. 102-106, Jan. 
1969a. 

Calabi L. & Hartnett W.E., "Some general results of coding theory with applications to the 
study of codes for the correction of synchronization errors", Inform. & Control, Vol. 15, pp. 
235-249, 1969b. 



185 



References 

Calabi L. & Arquette L.K., "Basic properties of error-correcting codes", Foundations of 
coding theory, Dordrecht, Holland, D. Reidling Publishing Co., pp. 17-38, 1974a. 

Calabi L. & Arquette L.K., "A study of error-correcting codes, I", Foundations of coding 
theory, Dordrecht, Holland, D. Reidling Publishing Co., pp. 40-59, 1974b. 

Capocelli R.M., "A note on uniquely decipherable codes", IEEE Trans. Inform. Theory, 
Vol. IT-25, No. 1 , pp. 90-94, Jan. 1 979. 

Capocelli R.M., "A decision procedure for finite decipherability and synchronizability of 
multivalued encodings", IEEE Trans. Inform. Theory, Vol. IT-28, No. 2, pp. 307-318, Mar. 
1982. 

Capocelli R.M., Gargano L. & Vaccaro U., "On the characterization of statistically 
synchronizable variable-length codes", IEEE Trans. Inform. Theory, Vol. 34, No. 4, pp. 
817-825, Jul. 1988. 

Capocelli R.M., De Santis A.A., Gargano L. & Vaccaro U., "On the construction of 
statistically synchronizable codes", IEEE Trans. Inform. Theory, Vol. 38, No. 2, pp. 407- 
14, Mar. 1992. 

Cedervall M. & Johannesson R., "A fast algorithm for computing distance spectrum of 
convolutional codes", IEEE Trans. Inform. Theory, Vol. 35, No. 6, pp. 1146-1159, Nov. 
1989. 

Chevillat P.R. & Costello D.J., "Distance and computation in sequential decoding", IEEE 
Trans. Commun., Vol. COM-24, pp. 440-447, Apr. 1976. 

Chevillat P.R. & Costello D.J. Jr., "An analysis of sequential decoding for specific time- 
invariant convolutional codes", IEEE Trans. Inform. Theory, Vol. IT-24, No. 4, pp. 443- 
451, Jul. 1978. 

Clark G.C. Jr & Cain J.B., Error correction coding for digital communications, New York, 
Plenum Press, 1981. 

Dunscombe E.J., "Some applications of mathematics to coding theory", PhD. Thesis, 
Royal Holloway and Bedford New College, (University of London), 1988. 

El Gamal A.A., Hemachandra L.A., Shperling I. & Wei V.K., "Using simulated annealing 
to design good codes", IEEE Trans. Inform. Theory, Vol. IT-33, No. 1, pp. 116-123, Jan. 
1987. 

Escott A., "A new performance measure for a class of two-length error correcting codes", 
Codes and cyphers: Cryptography and Coding IV, Essex, England, The Institute of 
Mathematics and its Applications, pp. 167-1 81 i 1995. 

Even S., "Tests for unique decipherability", IEEE Trans. Inform. Theory, Vol. IT-9, pp. 
109-112, Apr. 1963. 

Fano R.M., "A heuristic discussion of probabilistic decoding", IEEE Trans. Inform. 
Theory, Vol. IT-9, pp. 64-74, Apr. 1963. 

Farrell P.G., "Linear binary anticodes", Electron. Lett., Vol. 6, No. 13, pp. 419-421, Jun. 
1970. 

Farrell RG. & Farrag A., "Further properties of linear binary anticodes", Electron. Lett., 
Vol. 10, No. 16, p. 340, Aug. 1974. 



186 



References 



Farrell P.G., "An introduction to anticodes", Internal Report, Kent, England, The 
University of Kent at Canterbury, 1977. 

Ferguson T.J. & Rabinowitz J.HU "Self-synchronizing Huffman codes", IEEE Trans. 
Inform. Theory, Vol. IT-30, No. 4, pp. 687-693, Jul. 1984. 

Forney G.D. Jr., "Convolutional codes III. Sequential decoding", Inform. & Control, Vol. 
25, No. 3, pp. 267-297, Jul. 1 974. 

Gilbert E.N. & Moore E.F., "Variable-length binary encodings", Bell Sys. Tech. J., Vol. 38, 
pp. 933-967, Jul. 1959. 

Gilbert E.N., "Synchronization of binary messages", IRE Trans. Inform. Theory, Vol. IT-6, 
pp. 470-477, 1960. 

Golomb S.W., Gordon B. & Welch L.R., "Comma-free codes", Can. J. Math., Vol. 10, pp. 
202-209, 1958. 

Hartnett W.E. (Ed), Foundations of coding theory, Dordrecht, Holland, D. Reidling 
Publishing Co., 1974. 

Hatcher T.R., "On a family of error-correcting and synchronizable codes", IEEE Trans. 
Inform. Theory, pp. 620-624, Sep. 1969. 

Hazeltine B., "Regular expressions and variable length encodings", IEEE Trans. Inform. 
Theory, Vol. IT-9, p. 48, Jan. 1963. 

Hollmann H.D.L., "A relation between Levenshtein-type distances and insertion-and- 
deletion correcting capabilities of codes", IEEE Trans. Inform. Theory, Vol. 39, No. 4, pp. 
1424-1427, Jul. 1993. 

Huffman D. A., "A method for the construction of minimum redundancy codes", Proc. IRE, 
Vol. 40, pp. 1098-1101, Sep. 1952. 

Jacobs I.M. & Berlekamp E.R., "A lower bound to the distribution of computation for 
sequential decoding", IEEE Trans. Inform. Theory, Vol. IT-13, No. 2, pp. 167-174, Apr. 
1967. 

Jelinek F., "A fast sequential decoding algorithm using a stack", IBM J. Res. and Dev., Vol. 
13, pp. 675-685, Nov. 1969. 

Jeruchim M.C., Balaban R & Shannugan K.S., Simulation of communications systems, 
New York, Plenum Press, 1992. 

Kendall W.B. & Reed I.S., "Path-invariant comma-free codes", IRE Trans. Inform. Theory, 
Vol. IT-8, pp. 350-355, Oct. 1962. 

Kraft L.G., "A device for quantizing, grouping, and coding amplitude modulated pulses", 
MS. Thesis, Electrical Engineering Department, M.I.T., Mar. 1949. 

Kruskal J.B., "An overview of sequence comparison: Time warps, string edits, and 
macromolecules", SIAM Review, Vol. 25, No. 2, pp. 201-237, Apr. 1983. 

Levenshtein V.I., "Binary codes with correction of deletions, insertions and substitution of 
symbols", Dokl. Akad. Nank. SSSR, Vol. 163, pp. 845-848, 1965a. 

Levenshtein V.I., "Binary codes capable of correcting spurious insertions and deletions of 
ones", Probl. Peredachi Inform., Vol. 1, No. 1, pp. 12-25, 1965b. 



187 



References 



Levenshtein V.L, "Binary codes capable of correcting deletions, insertions and reversals", 
Sov. Phys. Doklady, Vol. 10, p. 707, 1966. 

Levy J.E., "Self-synchronizing codes derived from binary cyclic codes", IEEE Trans, 
Inform. Theory, Vol. IT-12, No. 3, pp. 286-290, Jul. 1966. 

Lin S. & Costello D.J., Error control coding: Fundamentals and applications, Englewood 
Cliffs, Prentice-Hall, 1983. 

MacWilliams F.J. & Sloane N.J.A., The theory of error correcting codes, Amsterdam, 
North-Holland Publishing Comp., 1978. 

Masek WJ. & Paterson M.S., "How to compute string-edit distances quickly", Tlme-warps, 
string edits and macromolecules: Theory and practice of sequence comparison, Reading, 
Massachusetts, Addison- Wesley, pp. 337-349, 1983. 

Massey J.L., "Variable-length codes and the Fano metric", IEEE Trans. Inform. Theory, 
Vol. IT-18, No. 1, pp. 196-198, Jan. 1972. 

Massey J.L., "Joint source and channel coding", Communication Systems and Random 
Process Theory, Alpen aan den Rijn, The Netherlands, Sijthoff & Noordhoff, Vol. NATQ- 
ASI series E, No. 25, pp. 279-293, 1978. 

Maxted J.C. & Robinson J.R, "Error recovery for variable length codes", IEEE Trans. 
Inform. Theory, Vol. IT-31, No. 6, pp. 794-801, Nov. 1985. 

McEliece R.J., "The Viterbi decoding complexity of linear block codes", Proc. 1994 IEEE 
ISIT, Trondheim, Norway, p. 341, 27 Jun. - 1 Jul. 1994. 

McMillan B., "Two inequalities implied by unique decipherability", IRE Trans. Inform. 
Theory, Vol. IT-2, pp. 115-116, Dec. 1956. 

Monaco M.E. & Lawler J.M., "Corrections and additions to 'Error recovery for variable 
length codes'", IEEE Trans. Inform. Theory, Vol. IT-33, No. 3, pp. 454-456, May 1987. 

Montgomery B.L. & Abrahams J., "Synchronization of binary source codes", IEEE Trans. 
Inform. Theory, Vol. IT-32, No. 6, pp. 849-854, Nov. 1986. 

Neumann P.G., "Efficient error-limiting variable-length codes", IRE Trans. Inform. Theory, 
Vol. IT-8, pp. 292-304, Jul. 1962a. 

Neumann P.G., "On a class of efficient error-limiting variable-length codes", IRE Trans. 
Inform. Theory, Vol. IT-8, No. 5, pp. S260-S266, Sep. 1962b. 

Neumann P.G., "Error-limiting coding using information-lossless sequential machines", 
IEEE Trans. Inform. Theory, pp. 108-115, Apr.. 1964. 

Papoulis A., Probability, random variables, and stochastic processes, New York, McGraw- 
Hill, 1965. 

Piret P., "Comma free error correcting codes of variable length, generated by finite-state 
encoders", IEEE Trans. Inform. Theory, Vol. IT-28, No. 5, pp. 764-775, Sep. 1982. 

Pless V, "Remarks on greedy codes", Lecture Notes in Comput. Sci., Vol. 573, pp. 58-67, 
1992. 

Pollara R, McEliece R.J. & Abdel-Ghaffar K., "Finite-state codes", IEEE Trans. Inform. 
Theory, Vol. 34, No. 5, pp. 1083-1089, Sep. 1988. 



188 



References 



Rahman M. & Misbahuddin S., "Effect of a binary symmetric channel on the 
synchronisation recovery of variable length codes", The Computer J., Vol. 32, No. 3, pp. 
246-251, 1989. 

Ramamoorthy C.V. & Tufts D.W., "Reinforced prefixed comma-free codes", IEEE Trans. 
Inform. Theory, Vol. IT-13, No. 3, pp. 366-371, Jul. 1967. 

Rouanne M. & Costello D.J., "An algorithm for computing the distance spectrum of trellis 
codes", IEEE J. Selected Areas in Commun., Vol. 7, No. 6, pp. 929-940, Aug. 1989. 

Sardinas A.A. & Patterson G.W., "A necessary and sufficient condition for unique 
decomposition of coded messages", Conv. Rec. IRE, Pt. 8, pp. 104-108, 1953. 

Sato K., "A decision procedure for the unique decipherability of multivalued encodings", 
IEEE Trans. Inform. Theory, Vol. IT-25, No. 3, pp. 356-360, May 1979. 

Savage J.E., "Sequential decoding - the computation problem", Bell System Tech. J., Vol. 
46, No. 1, pp. 149-175, Jan. 1966a. 

Savage J.E., "The distribution of the sequential decoding computation time", IEEE Trans. 
Inform. Theory, Vol. IT-12, No. 2, pp. 143-147, Apr. 1966b. 

Scholtz R.A., "Codes with synchronization capability", IEEE Trans. Inform. Theory, Vol. 
IT-12, No. 2, pp. 135-142, Apr. 1966. 

Scholtz R.A., "Maximal and variable word-length comma-free codes", IEEE Trans. 
Inform. Theory, Vol. IT- 15, No. 2, pp. 300-306, Mar. 1969. 

Scholtz R.A., "Frame synchronization techniques", IEEE Trans. Commun., Vol. COM-28, 
No. 8, pp. 1204-1212, Aug. 1980. 

Schouhamer Immink K.A., "Runlength-limited sequences", Proc. IEEE, Vol. 78, No. 11, 
pp. 1745-1759, Nov. 1990. 

Sellers F.F., "Bit loss and gain correction code", IRE Trans. Inform. Theory, Vol. IT-8, pp. 
35-38, 1962. 

Shannon C.E., "A mathematical theory of communication", Bell System Tech. J., Vol. 27, 
pp. 623-656, 1948. 

Sklar B., Digital communications: Fundamentals and applications, Englewood Cliffs, New 
Jersey, Prentice Hall, 1988. 

Sodha J. & Tait D., "Node synchronisation for high rate convolutional codes", Electron. 
Lett., Vol. 28, No. 9, pp. 810-812, Apr. 1992. 

Stiffler J.J., Theory of synchronous communication, Englewood Cliffs, NJ, Prentice-Hall, 
1971. 

Sweeney P., Error control coding - An introduction, New York, Prentice Hall, 1991. 

Takishima Y., Wada M. & Murakami H., "Error states and synchronization recovery for 
variable length codes", IEEE Trans. Commun., Vol. 42, No. 2/3, pp. 783-792, Feb. 1994. 

Tanaka E. & Kasai T., "Synchronization and substitution error-correcting codes for the 
Levenshtein metric", IEEE Trans. Inform. Theory, Vol. IT-22, No. 2, pp. 156-162, Mar. 
1976. 

Tanenbaum A.S., Computer Networks (2nd Ed), Englewood Cliffs, Prentice-Hall, 1988. 



189 



References 



Tavares S.E. & Fukada M., "Matrix approach to synchronization recovery for binary cyclic 
codes", IEEE Trans. Inform. Theory, Vol. IT- 15, No. 1, pp. 93-101, Jan. 1969. 

Titchener M.R., "Digital encoding by means of new T-codes to provide improved data 
synchronisation and message integrity", IEE Proc, Vol. 131, Pt. E, No. 4, pp. 151-153, Jul. 
1984. 

Titchener M.R., "A question of ambiguity: Introduction to the T-codes", Personal 
communication, Aug. 1988. 

Ullman J.D., "Near-optimal, single-synchronization-error-correcting code", IEEE Trans. 
Inform. Theory, Vol. IT-12, No. 4, pp. 418-424, Oct. 1966. 

Ullman J.D., "On the capabilities of codes to correct synchronization errors", IEEE Trans. 
Inform. Theory, Vol. IT- 13, No. 1, pp. 95-105, Jan. 1967. 

Vembu S., Verdu S. & Steinberg Y., "The source-channel separation theorem revisited", 
IEEE Trans. Inform. Theory, Vol. 41, No. 1, pp. 44-54, Jan. 1995. 

Viterbi A.J., "Error bounds for convolutional codes and an asymptotically optimum 
decoding algorithm", IEEE Trans. Inform. Theory, Vol. IT- 13, pp. 260-269, 1967. 

Viterbi A. J., "Convolutional codes and their performance in communication systems", 
IEEE Trans. Inform. Theory, Vol. COM-19, No. 5, pp. 751-772, Oct. 1971. 

Viterbi A.J. & Omura J.K., Principles of digital communication and coding, New York, 
McGraw-Hill, 1979. 

Wei V.K. & Scholtz R.A., "On the characterization of statistically synchronizable codes", 
IEEE Trans. Inform. Theory, Vol. IT-26, No. 6, pp. 733-735, Nov. 1980. 

Wolf J.K., "Efficient maximum likelihood decoding of linear block codes using a trellis", 
IEEE Trans. Inform. Theory, Vol. IT-24, No. 1, pp. 76-80, Jan. 1978. 

Zigangirov K.S., "Some sequential decoding procedures", Probl. Peredach. Inform., Vol. 2, 
No 2, pp. 13-25, 1966. 



190 



Appendix A 



VLEC Codes for the 26-Symbol English Source and the 128-Symbol ASCII 
Source 



Source Symbol 


Probability 


/v, -nromot Code 


OL\ i-oromnt Code 


& 


o 1770 i 

U. YJL /u 


ooooooo 


ooooooo 


t 


0 OQOfi 


1 0001 1 0 


0001 1 1 1 

V/Ulyi 111 


CL 


0 081 7 ' 


01 001 01 


001 001 1 

UU 1 UU 1 1 




0 07S1 


001 001 1 


001 1 1 00 


i 


0.0697 


0001111 


0100101 


n 


0.0674 


1100011 


0101010 


s 


0.0633 


1010101 


0110110000 


h 


0.0609 


1001001 


0110110111 


r 


0.0599 


0110110 


0111001000 


d 


0.0425 


0101010 


0111001111 


I 


0.0403 


0011100 


1000110000 


c 


0.0278 


1110000 


1000110111 


u 


0.0276 


1101100 


1001001000 


m 


0.0241 


1011010000 


1001001111 


w 


0.0236 


1011010111 


1010101000 


f 


0.0223 


0111001000 


1010101111 


g 


0.0202 


0111001111 


1011010000 


y 


0.0197 


111111111100 


1011010111 


p 


0.0193 


111111100111 


1100011000 


b 


0.0149 


1111111100011 


1100011111 


V 


0.0098 


1111111010101 


1101100000 


k 


0.0077 


11111110000000 


1101100111 | 


j 


0.0015 


11111111101100 


1110000000 


X 


0.0015 


11111111001001000 


1110000111 


q 


0.0010 


11111110100101000 


1111111000 


z 


0.0007 


11111110100101111 


1111111111 



Table A.l : ai-prompt and a^i-prompt codes for the 26-symbol English source. 



191 



Appendix A 



Source 


C X5 


C I6 


C| 7 


c l8 


C"i9 


Symbol 


R =0.5894 


R = 0.5890 


/? = 0.5894 


/? = 0.5228 


R = 0.5584 


e 


011111 


011100 


11110 


000000001 


0011000 


t 


0000111 


0001101 


010111 


000111111 


10000000 


a 


0011000 


0010010 


0011101 


011000110 


01100011 


o 


1011 1011 


01001000 


01101001 


10101 1000 


000010010 


i 


01001001 


10010001 


10011000 


1 10101010 


011001000 


n 


10000100 


10101110 


00000110 


111110101 


010100101 


s 


101011010 


100000100 


110010100 


1111111100 


111110011 


h 


011000110 


101110001 


000100010 


0101001110 


100001111 


r 


101101101 


111001111 


001011000 


1001100010 


101111100 


d 


110010101 


010111010 


0010000010 


1000001101 


1111001001 


I 


1010001001 


1001011010 


1010111011 


0010101000 


1010101100 


c 


1110010110 


oiooimio 


0100001111 


1010010110 


0010011011 


u 


0101001110 


1010100100 


1000010100 


1111000001 


0001101111 


ni 


1001110111 


1100010101 


01000000000 


0011011011 


0110100010 


w 


1101100000 


01000110001 


10000011011 


0100010000 


1101111010 


f 


10101000100 


01101111010 


10101101100 


01101001 1 1 


0111110101 


a 
& 


01010010011 


10110111000 


100000111000 


1100111011 


1001010100 


v 


111001 10001 


11001011111 


100011110101 


10101111111 


10100101010 


p 


11000111110 


10001101100 


00100101 11001 


11000001110 


11110111100 


b 


111000001101 


10011100011 


1000111110100 


11100110100 


10101010111 


V 


100000111010 


110010110000 


00100101111101 


01001010101 


11010001011 


k 


100100100001 


110001001011 


10000001110100 


10010010111 


01111010001 


J 


110100010100 


110101100110 


10001101111011 


00000110010 


11000110001 


X 


1110100001100 


110111111101 


10101111001010 


01111100110 


00010110010 


q 


1111001000000 


111111000000 


101011110011011 


01011011010 


00101101101 


z 


1101010100010 


1101010011011 


100011011111001 


11101001001 


01001100110 



Table A.2: Various VLEC codes for the 26-symbol English source with dtee = 5 



192 



Appendix A 



Source 


C20, dfcee-5 


C21, 


Symbol 


R =0.5906 


J?=0.4673 


e 


011100 


1111111 


t 


0010010 


01110001 


a 


00111110 


010010111 


o 


10010001 


0010110100 


i 


01000111 


0001001001 


n 


11101000 


00011000101 


s 


001010110 


10100011000 


h 


110000100 


100110010001 1 


r 


1.01101111 


001000101011 


d 


000110001 


010011000110 


I 


110011011 


101001011100 


c 


1010001010 


0001110111001 


u 


0100101011 


1010100001101 


m 


1101010011 


1000011010000 


w 


1000111101 


1100010101110 


f 


0101110100 


00010111101001 


g 


01001001010 


00101000110110 


y 


11100110111 


10001010001101 


p 


10111101011 


11000000111000 


b 


.11011011111 


01000110000010 


V 


11101111000 


000101111001101 


k 


10010110000 


101010011111000 


J 


100001101100 


100101010010000 


X 


100010111010 


010001100011011 


q 


010011000110 


100011000101101 


z 


110110101001 


110000000000110 



Table A.3: Two VLEC codes constructed using the heuristic construction with the majority 
voting algorithm for the 26-symbol English source 



Source Symbol 


Probability 


022,^=5,^=0.7292 


C 23 , d {ne =7, £=0.4632 


space 


1. 09x10"' 


00000000 


000000000000010 


ht 


9.58x1 0 2 


00011111 


100001110110110 


e 


7.20x1 0" 2 


11100011 


010001101101000 


t 


5.16X10' 2 


11111100 


110000011011100 


0 


5.15xl0" 2 


000100101 


001011011100100 


i 


4.68x1 0" 2 


111001101 


101010101010000 


nl 


4.30xl0" 2 


010010011 


011010110001110 


n 


4.15X10 -2 


100111000 


111011000111010 


s 


3.82X10" 2 • 


0000101010 


000110111111010 


r 


3.77x1 0 -2 


0001000111 


100111001001110 


a 


3.32X10 -2 


0010010100 


010111010010000 


d 


2.60x1 0 2 


0161011001 


110110100100100 


c 


2.53x1 0' 2 


1010001001 


001101100011100 


h 


2.33x1 0 -2 


1100110011 


101100010101000 



193 



Appendix A 



Source Symbol 


Probability 


/~* yj — r p — A 70Q0 


<-23» "tree-/, K-VAOJZ 


7 

/ 


Z.l /X1U 


i 1 m 1 nni nn 

1 1U1 1UU1UU 


mil nnnm i innn 

Ul 1 1UUUU1 1 1U1 1U 


! r 
J 


1 new 1 a "2 
1 .93X10 


nnnnm 1 ni 1 n 

UUUUUl 1U1 1U 


1111 ni 111 nnnni n 

1111U111 lUUUUlU 


* 


i .9IX1U 


nnnm 1 ni nm 

UUUU 1 1 U 1 UU 1 


nnnni nnnm 1 1 1 ni 

UUUU 1 UUUU 1 1 1 1 U 1 


771 


1 O Aw 1 A 

1 .80X10 


nni m nm 1 1 n 

UU1U1UU1 1 1U 


i nnm 111 nnm nm 

lUUUl 1 1 1 UUUl UUl 


U 


1.7 /X10 


nm i ni i nnm 

UUl 1U1 1UUU 1 


ni nni 1 ini mm 1 1 

U 1 UU 111U1U1U111 


L 


1 "5 1 A*2 

1 .37X10 


m m ni ni 1 1 1 

UIUIUIUI 111 


1 1 nni nni 1 1 nnm 1 

1 1 UUl UUl 1 1UUU1 1 




1 ") w 1 A"2 

1.23X10 


m ni 1 nnm nn 

U1U1 1UUU1UU 


nni nni ni 1m mi 1 
UU1UU1U1 1U1 1U1 1 


g 


1 1 C w 1 A "2 

1.15X10 


m 1 1 nni i m n 
Oil 1001 1010 


1 m nnm niimiii 
1 U 1 UUU 1U11U1111 


iV 


1.12x10 


i ni 1 nnnm 1 1 
101 1UUU01 1 1 


mi nnm ini 1 nnm 
u 1 1 UUU 1 1 u 1 1 UUU 1 


- 


T Tlvy 1 A~3 

7.72x10 


1 1 nm m nnm 
1 1UU1U100U1 


iii nm nnnnnm ni 

1 1 1UU1UUUUUU1U1 


* 


7.26x10 


r\nnr\f\ 1 n 1 1 n 1 n 
00000101 1010 


nnm nm 1 1 nnm m 

UUUl UUl 1 1UUU1U1 


w 


7.24X10 


nnm 1 nnm ni 1 
UUUl 1UU01U1 1 


1 nni m nm 1 1 nnm 

1UU1U1UU1 1 1UUU1 


y 


ZT A T+. * 1 A*3 

6.47x10 


aaai i ai aai nn 
0001 10100100 


ninimmmni 1 1 1 
U1U1U1U1U1U1111 




s~ /i i . ,i a -3 

6.41x10 


AAI AAI 1 1 All 1 

001001 110111 


1 ini nni nnm mi 1 
1 1 U 1 UU 1 UUU 1 1 u 1 1 


( 


6.33X10 


nni 1 1 if\mnr\n 

UUl 1 1 l\Jl\JvJUU 


nn 11111 nn 1 nnn 1 1 
UU 11111 UU 1 UUU 11 




6.31x10 


m 1 nni nm i nn 
U11UU1UU11UU 


1 n 1 1 1 nn 1 nn 1 n 1 1 1 
1 U 1 1 1 UU 1 UU 1 U 1 1 1 


J 


^ O ZTv > 1 A "3 

6.26x10 


m mini i imn 
Ul 1U1U1 1 1U1U 


m 1 1 1 nnm nni nm 
u 1 1 1 1 UUU 1 uu 1 uu 1 




c An. . 1 a~3 

5.98x10 


1 m 1 m 1 m m n 
1U1 1U1 1U1U1U 


1111111111111 m 

111111111111 1U1 




5.88x10 


1 1 nnm 1 1 f\r\r\r\ 
1 1UUU1 1 1U00U 


nnnni nni imm 1m 

UUUU 1 UUl 1U1U11U1 


r 

L 


5.75x10 


i 1 ni 1 1 ni ni i n 
1 101 1 10101 10 


1 nnm 1 1 ni 1 nnm ni 
1 UUU 111011 00U 1 U 1 


> 


5.62x10 


1 1 1 1 1A1 1A1A1 
1111 101 10101 


ai aai 111 ai 111 nm 
0100111101111001 


7 
J 


5.57x10 


AAAI AAI 1 A1 1 1 A 
0001001101110 


1 1 nm aaaaaai nnm 
1 1 UU 1 UUUUUU 1 UUU 1 


V 


5.46x10 


AAA1 A1 A1 1 AAAA 
00010101 10000 


AAI AA1 AAA1 1 AAAA1 

UU 1 UU 1 UUU 1 1 UUUU 1 




a *"T /~\.. „ i a~3 

4.70x10 


AA1AAA1 1 1 1A1 1 
0010001 111011 


1 AI AAAI 1 AAAA1 AA 1 

1 U 1 000 1 1 0000 1 00 1 


U 


/I A A ^ , A A~3 

4.44x10 


AAI AA1 AAAI 1 1A 
0010010001 110 


All AAA 1A1A1 1A1A1 

U 1 1 UUU 1U1U11U1U1 


1 


A AO».*1 A~3 

4.03x10 


AA 1A1 1A1A1 A A 1 

UU 101 10101001 


111 nm m 1 im 1 mi 
1 1 1UU1U1 11U111U1 


1 


3.65x10 


AA1 1 1A1A1AA1A 

001 1 101010010 


AAAI AA1 AA1 AI 1 1 AI 
000100100101 1 101 


1 


3.16x10 


Ai Ai 1A1A1A1A1 

U1U11U1U1U1U1 


inmnininm imm 
1UU1U1U1UU1 1U1U1 


n. 
# 


3.1 1x10 


A1 A1 1 1 AAAA1 1 A 

U101 1 100001 10 


m ni m nm nnm nm 
U 1 U 1 U 1 UU 1 UUU 1 UU 1 


h 


3.1 1x10 


ni ini 1 ai 1 aai a 
01 101 101 10010 


1 1 m nm 1111 nnnm 
1 1 U 1 UU 11111 UUUU 1 


c 


2.29x10 


1 nnnni i nm nnn 
1 UUUU 1 1 UU 1 UUU 


nm 111111 nm nnm 

UUl 11111 1 UUl UUUl 


K 


O O0\/1 A*3 

2.28X10 


1 nnm i inini 1 1 
1UUU1 11U1U111 


miii nnm 1111 nm 
1 u 1 1 1 UUU 11111 uu 1 


V 


o i i a*3 
2.16X10 


1 nni 1 ni nni m 1 

1UU1 1U1UU1U1 1 


m 1 1 1 nni m nnm m 

Ul 1 1 1UU1UIUUU1U1 


o 


1 0*7w 1 A"3 

1.87X10 


1 n i n i nn 1 nn/1 1 n 
1U1U1UU1UU1 1U 


1111111 nnn 1 n 1 1 n 1 
1111111 UUU 1 u 1 1 u 1 




1 T7v/1 A"3 

1.77x10 


1 1 m nm 1 1 nni n 
1 1U1UU1 1 1UU1U 


nnnnnnni 11m nni i 

UUUUUUUl 1 1U1UU1 1 




1 *7Av/ 1 A "3 

1.70X10 


nnnnm ni ni m nn 
UUUUU 1 U 1 U 1 U 1 UU 


1 nnnni imm 1 mi 1 

1UUUU1 1U1U1 1-1 Ul 1 




1 /Clwl A~3 

1.67X10 


nnnm ni i nm m n 
UUUU 1 U 1 1 UU 1 U 1 U 


mnnm 1 1 nnnnm 1 1 

U1UUU1 1 1 UUUUUl 1 1 


W 


1 64x1 0' 3 


00010101101111 


1100000001101111 


A 


1.61X10' 3 


00100110101110 


0010110000011111 


O 


1.61X10" 3 


00111001010110 


1010101101110111 


X 


1.57xl0" 3 


00111001101000 


0110101011001011 




1.56X10 -3 


01001110010101 


1110110110100011 


L 


1.56X10" 3 


01011010000110 


0001101000100011 



194 



Appendix A 



Source Symbol 


Probability 


«free-5, A-0./292 


C23 ? "free - A— 0.4oi2 j 


r\ 
U 


1 COnxI a*3 
1 .32X1 U 


m 1 nm niiiniii 
Ul 1UU1U1 1 1 u 1 1 1 


i nni 1 1 ni m nni ni i 

1 UU 1 1 1 U 1 U 1 UU 1 U 1 1 


K 


t A A v/* 1 A "3 

1.44X10 


u 1 1 1 uu 1 u 1 1 1 uu 1 


n i n 1 1 1 nn iiiimii 

U 1 U 1 1 1 OU 1111U111 


J 


1 1 Q\/ 1 A "3 

1.J9X10 




iiniimii nm 1111 
11U11U111 uu 1 1 1 1 1 


i 

T 


1.36X10 


1 UUU 111U11U11U 


nm 1A111111A1111 

UU 11U111111U1111 




1.26x10 


mini nn 1 1 1 1 n 1 1 
AUIUIUUI 1 1 1U1 1 


1 ni 1 aaaai Annm 1 1 
1U1 1UU001000U1 1 1 


n 

y 


1.15X10 


1 minim 1 1 nn 1 n 
11U1U1U11 1UU 1U 


ni 1 1 AAAi nm 1 1 m 1 
U 1 1 1 000 1 00 111011 


M 


1.05X10 


1 1m 1 ini ini nn 1 
1 1U1 1 1U1 1U1UU1 


1 1 1 1 n 1 1 nn 1 n 1 nn 1 1 
1 1 1 100 101 001 1 


< 


9.51x10 


AAAA1 1AA1AA1A1 1 
UOUU 11 00 i 00 1011 


aaaai ni ni nni 1 nm n 
ouuu 1 0 1 0 1 00 1 1 00 1 0 


n 
£> 


9.18x10 


AAAA1 1 1 AA1 AI 1 in 

000U 111 00 1 0 1 1 1 u 


1 AAAI 1 AI 1 1 1 1 AAAI A 

1 UUO 11011111 000 10" 


Z 


8.52x10 


nnni ni nni ni 1 nnn 
000 101001011 000 


ni nni 1 nnni nni 1 ai a 
010011 000 10011010 


4 


*7 c /ivy i rv*4 

7.54x10 


AAA1 1 AA1 AI AA1 1 A 
000 11 00 101 00 110 


1 1 nn inii a a 1 a a 1 n 1 n 
1 1 00 1011001 00 1010 


u 


*7 ACv/ 1 A"4 

7.05x10 


nn 1 nn i minim in 
00 1 UO 110101011U 


nm nni 1 inimmnm 
00 1 UU 111010101010 


> 


z' c ^ j. 1 rv-4 

6.56x10 


A A 1A1A1A1 1 1A1A1 
00 1010101110101 


1 A1 AAAAAAA1 1 1 1 A1 A 

101 0000000 1111010 




6.56x10 


A A 1A1A1 1A1 1 1A1 1 
00 101011011 1011 


AI 1 AAAA1 1 AAAAAA1 A 

01 100001 100000010 


> 

/ 


zr *5 a^„* 1 a~4 

6.39x10 


AA1 111 AI A1 1 AAAA 
00 11110101 10000 


111 AAI 1 AI 1 1 A1 AA1 A 

11100110111010010 


0 


yf A "4 

6.23x10 


AA1 1 1 1 A1 1 A A 1 1 1 A 
00 1111011 00 1110 


nnni nnni ni 1 m nm n 
000 1 000 101101 00 1 0 


o 

& 


Z* A/Tvy 1 A~4 

6.06x10 


A1 AA1 1111 AAA1 1 A 
01001 111 10001 10 


i nni m 1 aaaaaaaai n 
1 UU 1 U 1 1 00000000 1 0 


{ 


5.90x10 


A 1 A 1 AAA 1 1 A A 1 A 1 A 
0101 000 11001010 


A1A1A1 1 1 1A1 1 1 1 AI A 
01010111101111010 


r> 

r 


5.74x10 


A1 1 1 1A1A1 1A1 1 1 1 
01 1 110101 101 111 


1 1 A 1 AAAA 1 1A1A1A1A 
11010000110101010 


c 
J 


5.57X10" 4 


AI 1 1 1 1 1A1A1 1A1A 
01 1 1 1 1 10101 1010 


! AAI 111 AAI A1 AAI AI A 
001 1 1 100101001010 




4.42x10 


1 AAAAA1 AI 1 1 AA1 A 
1 00000 101110010 


1A111A1111 AA1 1 A1 A 
10 11 101111 00 11010 


o 

o 


yf y<_ , 1 /-\— 4 

4.26x10 


1A1A1 1 1 1A1 1A1 1A 
10101 1 1 101 101 10 


A1 111 AI AA1 1 1 AAAI A { 
011110100 11 100010 


7 


3.93x10 


1 1 A A 1A1 1 1A1 1A1A 
1 100101 1101 1010 


111111A1 AAA1 1 AA1 A 
11111 1010001 10010 


1 


3.77x10 


1 1A1 1 1 1 1A1A1A1A 

1 101 111 10101010 


AAAAAA1 AI 1 1 AAI 1 1 A 
000000101 1 1001 1 10 


0/ 


3.77x10 


1111 AA 1 A 1 A 1 A 1 A A 
111 100101010100 


1AAAA1 AI 1AAA1 1 1 1A 
10000101 10001 1 1 10 


rj 

H 


TT- . 1 y*\-4 

3.77x10 


AAAAA1 A1 A1 1 1A1 1A 

0000010101 110110 


A1 AAAI AAAAI 1 AA1 1 A 
01000100001 1001 10 




2.95x10 


AAAA1 1 AA1 1 AA1 AAA 

00001 1001 1001000 


1 1 AAAA1 1 A1 A1 1 A1 1 A 
1 100001 101 0110110 




2.29x10 


AAAA 1 1 1 AA1 1 A1 1 1 1 

00001 1 1001 101 111 


A A 1 A 1 1 1 1 A A 1 A 1 A 1 1 A 
00101111001010110 


j 


1.80x10 


AAA1 1 AA1 AI AI AA1 A 

000 1 1 00 1 0 1 0 1 00 1 0 


1 A1 AI AAAA1 AAAAI 1 A 
10101 0000 1 0000 110 


zr 
K 


1.64x10 


AA 1 A A 1 1 A 1 A 1 A A 1 1 A 

00 1 00 110101001 10 


A 1 1 A 1 A A 1 1 1 1 1 1 1 1 1 A 
01101 00 1111111110 


Q 


l. 47x10 


aa iAiiiiiiniinii 
00 10111111011011 


1 1 1 ai 1 1 ai nm m 1 1 n 
111011101 00 101110 


A 


1.47x10 


AA1 1 AA1 1 AA1 m 1 1 A 

00110011 00 101110 


AAAI 1 AAI AAA1 AI 1 1 n 

UUU1 1001000101 1 1U 


5> 


3.28x10 


AA 1 1 AI 1 AA1 AI A 1 1 A 

00 11011 00 1 0 1 0 1 1 0 


1 nm 111 nm 11111m 
1001 1 1 1001 1 1 1 1 1 10 


z 


1.64x10 


aai i i nnni nnni i in 
001 1 100010001 110 


At ai nni innnm in 
0101111111 0000 1 1 u 


J) 


i ^ y| . 1 rv-5 

1.64x10 


ai a aai iiiininiii 
010001 1111010111 


1 1 ni 1 AAAI A1A1A1 m 
11011 0UU 1U1U1U11U 


@ 


l. 64x10 


AiAiiiiAiiniiniA 
0101111011011010 


nni 1 m nm im ini m 

UUl 1U1UU1 1U1-1U1 1U i 


A 


1.64x10 


ai i AAAA1 A1 1 1 AI AI 

011 0000 101110101 


1 n 1 1 nn 1 1 1 n 1 1 nn 1 in 
1U1 1UU1 1 1U1 1UU1 1U 


> 


1 64* 10" 5 

1 . U*TVS 1 Vf 


0111110110111011 


01110010000011110 




1.64xl0- 5 ' 


1001101110110110 


11110101011001110 




1.64X10- 5 


1010110110001010 


00000011001100001 


bel 


1.64xl0" 5 


1011011011001010 


10000100010110001 


bs 


1.64xl0" 5 


1100101001001011 


01000101111001001 


can 


1.64xl0" 5 


1101111100101110 


11000010100011001 ; 



195 



Appendix A 



oourcc oymDOi 


rrODaDlllLy 


^22> "free 3> A— U. /Z:*Z 


<-23> «free~/» A— U.40JZ 




1 .04X 1 U 


1 1 1 101 1001 101111 

111 1W1 1 \J\J 1 1 V 1 1 1 1 


00101 1 101 1 1 1 1 1001 

uu lvl 1 Ivl 11111 \J\J 1 


dr? 


1 A4y 1 CY^ 
1 .04X 1 U 


0000001 1 1 1 001 01 1 0 


10101001 100101001 

1 U 1 U 1 UU 1 1 \J\J 1 u 1 uu 1 


drl 


i .04xiu 


00001 01 1 1 1 1 01 1 01 1 

Uvuu lull 1 1 1 V 1 1U1 1 


01 1 01 000001 01 0001 
\j 1 1 \j 1 iui wuu 1 


drd 


1 AAy 1 O"^ 
1 .D4X 1 u 


001001 1 1 10100101 1 
i in ivi l vi i 


1110111101 0000001 

111U1111U1 Uv/UUvu 1 


drl 




00101 1 1 1 001 101110 
uu i u i ill uu 1 1 u i 1 1 u 


0001 10001 10000001 

UUU1 1UUU1 1UUUUUU1 


CMZt 


1 A4v 1 A"^ 


001 1 01 001 01 01 001 0 
UU 1 1 U 1 UU 1 U 1 U 1 UU 1 u 


1 001 1 1 1 1 1 ni m nooi 

1 UU 111111U1U1 UUU 1 


dip 

cue 


1 .04X1 u 


001101110101101 10 

UU 1 1 U 1 1 1U1U1 1U1 1U 


0101111 nnnm ni nm 

U 1 U 1 111 UUUU 1 u 1 uu 1 


AIM 


1.04X1U 


001 1101111 0001 110 

UU 111U1111 UUU 1 1 1 u 


1 1 m 1 nni m 1 1 1 1 nm 

1 1 U 1 1 UU 1U1 1 1 1 1 uu 1 


CYIQ 


1.04X1U 


nm 1 1 1 10101 1 inioi 

UU 11111U1U111U1U1 


nm 1 m ni nnnm 1 nm 

UUl 1U1U1UUUU1 1UU1 


cot 


1.04X1U 


ni oooi 11011011010 

U 1 UUU 111U1 1U1 1U1U. 


mil nm nni 1 nm nm 
1 u 1 1 uu 1 UU 1 1 uu 1 uu 1 


CSC 


1 .04X 1 U 


ni nni 1 1 1 1 10101010 

U 1 UU 11 11 1 1U1U1U1U 


m 1 1 nm 1 1 1 m 1 nnni 
u 1 1 1 uu 1 1 1 1 u 1 1 UUU 1 


CIO 


1.04X1U 


m im innoiooni tin 

U 1 1 U 1 1 UUU 1 UUU 1 1 1 u 


111 1 ninm m mnnm 
1 1 1 1 u 1 uu 1 u 1 1 UUUU 1 


£>t-V 

CIX 


1 .04X1 U 


ni 11 101 101 mini in 

Ul 1 1 1U1 1U1 1U1U1 1U 


nnnm m 1 m nm 1 1 m 

UUUU1U1 1U1UU1 1 1U1 




1.04X1U 


01 1 1 1 10100101 1000 
Ul 1 1 1 1U1UU1U1 1UUU 


1 nnm 1 nnnm nm 1 ni 
1 UUU 1 1 UUUU 1 uu 1 1 u 1 




1 .04X1 U 


i nnni 1 i 1 ooi 111011 
1 UUU 1111 uu 1 1 1 1 u 1 1 


ni nni 1 m 1 nm mini 
u 1 uu 1 1 u 1 1 uu 1 1 u 1 u 1 




1 .04X 1 U 


100101 1 1001001010 
1 uu 1 u 1 1 1 uu 1 UU 1 u 1 u 


1 mm mm 1 1 inmm 

1 1 UU 1U1U1111 uu 1 u 1 


vicik 


1 .04X1U 


1001 1010101010100 

1 UU 1 lUlUiULULUL UU 


nn mm im nnnnmm 

UU 1 UU I LU I uuuuu 1 u 1 


np 


1 .04X1 U 


1 001 1110110101110 

1 UU IIIIUIIUIUIIIU 


1 ni nnnm 1 1 1 m mm 

1 U 1 UUUU 1 1 1 1U1U1U1 


ftlll 


1 .04X1U 


1 1 ni nm 01 01110111 

1 1 U 1 UU 1U1U111U111 


ni 1 nnnnnm m m 1 ni 

U 1 1 UUUUUU 1U1U11U1 


rs 


1 .04X1 U 


iiii ni inm innmnn 

1 1 1 1 U 1 1 UU 1 1 UU 1 UUU 


111 nni 1 1 nm 1 1 1 1 ni 
111 uu 111 uu 1 1 1 x 1 u 1 


si 


1 .04X 1 U 


1111m 111 fliini 111 

llllUllllUllUllll 


nnm nnnm ni 1 1 1 1 ni 

UUU1UUUU1U1 1 1 1 1U1 


so 


1 .54X1 U 


11111111111 nil nil 
1 1 1 1 1 1 1 1 1 1 1U 1 1U 1 1 


1 nni m 1 1 1 1 ni ni 1 ni 

1 UU XUi X X 11U1U1 1U1 


soh 


1.64xl0" 5 


000001111110100110 


01010110011010101 


stx 


1.64X10" 5 


000010110101110111 


11010001000000101 


sub 


1.64X10" 5 


001011011110010110 


00111101011100101 


syn 


1.64X10 -5 


001011110011011010 


10111010000110101 


us 


1.64X10" 5 


001100111111011011 


01111011101001101 


vt 


1.64xl0" 5 


001101111001001010 


11111100110011101 ! 



Table A.4: Two VLEC codes for the 128-symbol ASCII source derived from a C-program 



196 



Appendix B 

Two Algorithms to Calculate the Distance Spectrum of VLEC Codes 



Internal Report, Comms. Res. Group, University of Manchester, pp. 1-15, Apr. 1994. 



197 



Appendix C 

Published Papers 



213 



"A maximum likelihood decoding algorithm for variable-length error- 
correcting codes", 

Proc. 5th Bangor Symposium on Communications, 
Bangor, Wales, 
pp. 56-59, 
2-3 Jun. 1993. 



214 



"On variable-length error-correcting codes", 
Proc. 1994 IEEE ISIJ, 
Trondheim, Norway, 
pp. 507, 

27 Jun. - 1 Jul. 1994. 



219 



"Sequential decoding of variable-length error-correcting codes", 
Eurocode 94, 
Cote d'Or, France, 
pp. 93-98, 
24-28 Oct. 1994. 



221 



"A maximum a-posteriori (MAP) decoding algorithm for variable-length 
error-correcting codes", 

Codes and cyphers: Cryptography and coding IV, 
Essex, England, 

The Institute of Mathematics and its Applications, 
pp. 103-119, 1995. 



228 



Index 



— #— 

a-correcting, 47 
a-prompt, 48 

— A— 

additive white Gaussian noise (AWGN) 

channel, 87 
admissibility mapping, 46 
anticode, 147 



base code, 53, 164 
BCH code, 166 
block codes, 27 
block distance, 74 

minimum, 74 

overall minimum, 74 

undefined, 75 



catastrophic, 77, 113 
channel encoder, 26 
code-anticode construction, 147 
codeword deletion, 160 
column distance function, 109, 126 

evaluation, 113 
column permutations, 148 
combined source and channel,coding, 28, 

175 
comma, 39 
comma codes, 39 
comma-free codes, 40 
comma-free error-correcting codes of 

variable length, 184 



communication system, 24 
complementary error function, 89 
complexity, 100, 175 

convolutional codes, 166 

modified Viterbi algorithm, 101 

stack algorithm, 123 
computer simulation, 86 
consecutive states, 68, 77 
constraint length, 28, 68, 76, 103, 166 
construction 

a-prompt, 5 1 

code-anticode, 147 

heuristic, 152 
converging distance, 75 
convolutional codes, 27, 166 
coset leader, 157 

— D— 

data compression, 25 
decoding window depth, 99 
deletion errors, 136 
derived codes, 53 

complete, 53 

fixed-ratio, 53 
digital communication system, 25 
distance, 32 

converging, 75 

diverging, 74 

invariant, 84 

spectrum, 82 

spectrum evaluation, 84 
diverging distance, 74 

— E— 

effective error span, 129 
average, 130 

probability distribution, 139 



246 



variation with cr, 139 
effective range of a codeword, 50 
energy per bit, 88 
entropy, 36 

equivalent VLEC codes, 173 

error event probability, 82, 129 

error mapping. See admissibility mapping 

error span, 129 

average, 43, 131 
expanded code, 46 
extended code, 67 

— F— 

Fano metric, 55, 105 

fidelity criterion, 25 

finite state machine representation, 182 

finite state VLEC codes, 184 

first error event, 79 

probability, 80 
forbidden states, 107 
free distance, 75 

bound, 75 

— G— 

Gilbert lower bound, 50 
greedy algorithm, 153, 157 
restricted, 158 

— H— 

Hamming distance. See distance 
Hamming sphere packing upper bound, 
50 

Hamming weight, 32 

heuristic construction algorithm, 152 

horizontal 

linearity, 144, 148 

sub-codes, 144 

— K— 

Kraft inequality, 35, 50 



length 

average codeword, 32 



Index 

path, 64 
Levenshtein distance, 58 

— M— 

majority voting algorithm, 158 
MAP 

decoding, 92 

factor, 94 

metric, 71 
marker, 38 

Massey metric, 55, 104 

modified, 56 
maximum likelihood decoding, 67 
McMillan inequality, 34 
minimum distance 

block, 74 

converging, 75 

diverging, 74 
modification vector, 41, 148 
Monte Carlo simulation, 87 
multi-level encodings, 46 

— N — 

non-linearity, 144 



overall minimum block distance, 74 

— P— 

parallel transitions, 68 

Parke Mathematical Laboratories, 29 

path 

definition (in a tree), 64 
definition (trellis), 67 
extended, 68 
length, 64 
reference, 85 
secondary, 86 
span, 64 

path invariant comma-free codes, 41 
perfect code, 164 
prefix, 35 

prefix decoding, 49, 95, 164 
prefix decomposition, 48 
prefixed comma-free codes, 39 



247 



Index 



proper prefix, 32 

proper suffix, 32 

properties 
efficiency, 36 
exhaustive, 36, 178 
finite decoding delay, 33, 177 
instantaneously decodable, 35, 177 
non-singular, 33, 177 
redundancy, 36 
self-synchronising, 40 
statistically synchronisable, 42 
synchronisable with finite delay, 39, 
177 

uniquely decodable, 33, 177 
— R— 

rate, 88 

restricted greedy algorithm, 158 

search and add algorithm, 1 57 
segment decoding, 52, 95, 145 
segment decomposition, 49, 145 
self-synchronising, 40 
separation theorem, 26 
sequential, 123 

sequential decoding, 106,. 146, 170 

average number of extended paths, 121 
average number of paths which visit 

the top of the stack, 123 
condition for correct decoding, 115 
metric, 105 

stack size, 107, 118, 126 

stack-bucket algorithm, 124 
sequentially catastrophic codes, 113, 138 
Shannon's first theorem, 5 1 
simulated annealing, 173 
source encoder, 25 
spatial memory, 28 

stack algorithm. See sequential decoding 
stack size, 107, 118, 126 
state-splitting VLEC (SSVLEC) codes, 
180 

statistically synchronisable, 42 
swap code, 163 



swap operation, 163 

symbol error probability, 57, 59, 129 

approximate relation, 91 

bound, 83 
sync pulse, 38 
synchronisation, 176 

acquisition, 133 

with deletion errors, 137 

with insertion errors, 139 
synchronisation delay 

average, 43 

bounded, 42, 78 
synchronisation properties over BSC, 129 
synchronisation-error-correcting codes, 

60 

synchronous, 42 

— T— 

tail decoding, 57, 95, 118 
T-codes, 44 

tree structure for VLEC codes, 64 
trellis decoding, 1 76 
trellis stage, 68 

trellis structure for VLEC codes, 67 
two-length error-correcting codes, 53, 98, 
142, 164, 176 

— U— 

unequal error protection, 177 
unequal length free distance, 111, 132 
uniquely decodable, 103 

— V— 

vertical 

linearity, 144 

sub-codes, 144 
Viterbi decoding algorithm, 69, 146 

— W— 

weight. See Hamming weight 
weight spectrum, 84 



248 



